
People to Watch 2023 – Fangjin Yang
What spurred you to create Apache Druid? Why couldn’t existing databases solve the needs you had at Metamarkets?
Back in 2011, we were trying to quickly aggregate and query real-time data coming from website users across the Internet to analyze digital advertising auctions. This involved large data sets with millions to billions of rows. While we weren’t intending to build a new database for this, we tried building the application with several relational and NoSQL databases, but none were able to support the performance and scale requirements for rapid interactive queries on this high dimensional and high cardinality data.
What is the key attribute that has made Druid so successful?
The key to Druid’s performance at scale is “don’t do it.” It means minimizing the work the computer has to do. Druid doesn’t load data from disk to memory, or from memory to CPU, when it isn’t needed for a query. It doesn’t decode data when it can operate directly on encoded data. It doesn’t read the full dataset when it can read a smaller index. It doesn’t send data unnecessarily across process boundaries or from server to server.
With this philosophy of “don’t do it,” you end up having an architecture that’s incredibly efficient at processing queries at scale and under load. And it’s why Druid can be so fast and deliver aggregations on trillions of rows at thousands of queries per second in sub-second.
How do you see the market for big and fast analytics platforms evolving in 2023? Do you think we’ll continue to see the introduction of novel database engines?
We see an emergence of a new category of data infrastructure – real-time analytics databases – to address the growing demand of developer-built analytics applications built on real-time, streaming data. The need for faster query performance at scale isn’t slowing down. It’s become a game-changer as it unlocks new operational workflows for so many Druid users like Confluent, Netflix, and Salesforce. Will there be more database engines emerging over time? For sure, developers are constantly innovating and driving new workload requirements that need databases built-for-purpose.
Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
I used to play video games semi-professionally, and am still an avid eSports fan.
March 23, 2023
- IDC Report: Responsible AI Integral to Unlocking Benefits for B2B Enterprises
- Zenoss Launches Identity Management Capabilities
- Snowplow Appoints Industry Veterans to Expand Technical Leadership Team
- Alteryx Announces Partnership with DOD to Address Data and Analytics Skill Gap in Private Sector
- Striveworks Partners with Carahsoft to Provide AI and ML to Government Agencies
- Gartner Survey Reveals Less Than Half of Data and Analytics Teams Effectively Provide Value to the Organization
March 22, 2023
- Nutanix Study Shows Data Management Becoming More Complex as Cloud Deployments Diversify
- Dataiku Joins NVIDIA DGX-Ready Software Program to Simplify Enterprise AI
- UC San Diego Data Science Student Provides Guidance for High School DataJam Team
- TYAN’s AI Inference-Optimized Platforms Add Support for NVIDIA L4 Tensor Core GPU
- Weights & Biases Announces Integrations with NVIDIA AI
- Vultr Announces Availability of NVIDIA H100 Tensor Core GPU and Partnerships with Domino Data Lab and Anaconda
- AnswerRocket Introduces Max, an AI Assistant for Data Analysis
- Panintelligence and Yellowbrick Data Partner to Improve Embedded Analytics for SaaS Applications
March 21, 2023
- Domopalooza 2023 Unveils Full Lineup of Industry Experts and Customer Speakers
- Ascend.io Announces Industry-first Data Mesh Innovation
- Aerospike Expands Community Leadership and Enterprise Support for Spring Framework
- Aible Unveils Enterprise-Grade Solution to Generative AI’s Hallucination Problem at the 2023 Gartner Data & Analytics Summit
- Akkio Launches Chat Explore Powered by GPT-4
- Cribl Releases Product Enhancements Across Portfolio to Simplify and Personalize All Observability Data
Most Read Features
- Prompt Engineer: The Next Hot Job in AI
- Data Mesh Vs. Data Fabric: Understanding the Differences
- Iceberg Data Services Emerge from Tabular, Dremio
- Open Table Formats Square Off in Lakehouse Data Smackdown
- Hallucinations, Plagiarism, and ChatGPT
- The Future of Databases Is Now
- GPT-4 Has Arrived: Here’s What to Know
- Apache Pinot Uncorks Real-Time Data for Ad-Tech Firm
- March Madness Brings Out the Analytics
- Five Drivers Behind the Rapid Rise of Apache Flink
- More Features…
Most Read News In Brief
- Observability Primed for a Breakout 2023: Prediction
- Mathematica Helps Crack Zodiac Killer’s Code
- Multi-modal GPT-4 Rumored To Be Released This Week
- Google Cloud’s 2023 Data and AI Trends Report Reveals a Changing Landscape
- Bill Gates Says the Age of AI Has Begun, Bringing Opportunity and Responsibility
- Has Microsoft’s New Bing ‘Chat Mode’ Already Gone Off the Rails?
- Big Growth Forecasted for Big Data
- Meta Releases LLaMA Foundation Language Models to Researchers
- Andrew Ng’s Landing AI Offers Free Trial of LandingLens CV Platform
- Observability Overload: Grafana Labs Survey Builds a Case for Centralized Solutions
- More News In Brief…
Most Read This Just In
- Google Cloud and Accenture Expand Strategic Partnership, Announce Platform Tech Integration
- Former Cloudera CPO & Hortonworks Cofounder, Arun Murthy, Joins Scale AI
- Colossal-AI Releases Open Source Framework for ChatGPT Replication
- Salesforce Launches Hyperforce EU Operating Zone
- Comet Releases MLOps Industry Report | 2023 Machine Learning Practitioner Survey
- Esri Releases New App to Easily View and Analyze Global Land-Cover Changes
- AWS Announces General Availability of Amazon OpenSearch Serverless
- Savant Labs Secures $11M Funding to Ease Operational Analytics
- Salesforce Delivers New Analytics, AI, and Automation Solutions for Communications Providers
- Akkio Launches Chat Explore Powered by GPT-4
- More This Just In…
Sponsored Partner Content
Sponsored Whitepapers
Sponsored Multimedia
Contributors
Featured Events
-
The Connected Worker Summit 2023
March 28 @ 8:00 am - March 30 @ 5:00 pm -
Modern Data Pipelines 2023
March 30 @ 8:00 am - 5:00 pm -
AI in Finance Summit NY
April 20 - April 21New York NY United States -
AI & Big Data Expo North America 2023
May 17 @ 8:00 am - May 18 @ 5:00 pm -
CDAO Insurance 2023
June 13 - June 14