People to Watch 2023 – Fangjin Yang
What spurred you to create Apache Druid? Why couldn’t existing databases solve the needs you had at Metamarkets?
Back in 2011, we were trying to quickly aggregate and query real-time data coming from website users across the Internet to analyze digital advertising auctions. This involved large data sets with millions to billions of rows. While we weren’t intending to build a new database for this, we tried building the application with several relational and NoSQL databases, but none were able to support the performance and scale requirements for rapid interactive queries on this high dimensional and high cardinality data.
What is the key attribute that has made Druid so successful?
The key to Druid’s performance at scale is “don’t do it.” It means minimizing the work the computer has to do. Druid doesn’t load data from disk to memory, or from memory to CPU, when it isn’t needed for a query. It doesn’t decode data when it can operate directly on encoded data. It doesn’t read the full dataset when it can read a smaller index. It doesn’t send data unnecessarily across process boundaries or from server to server.
With this philosophy of “don’t do it,” you end up having an architecture that’s incredibly efficient at processing queries at scale and under load. And it’s why Druid can be so fast and deliver aggregations on trillions of rows at thousands of queries per second in sub-second.
How do you see the market for big and fast analytics platforms evolving in 2023? Do you think we’ll continue to see the introduction of novel database engines?
We see an emergence of a new category of data infrastructure – real-time analytics databases – to address the growing demand of developer-built analytics applications built on real-time, streaming data. The need for faster query performance at scale isn’t slowing down. It’s become a game-changer as it unlocks new operational workflows for so many Druid users like Confluent, Netflix, and Salesforce. Will there be more database engines emerging over time? For sure, developers are constantly innovating and driving new workload requirements that need databases built-for-purpose.
Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
I used to play video games semi-professionally, and am still an avid eSports fan.
April 18, 2024
- SAS Viya Expands Generative AI Capabilities with New Data Maker and Industry-Specific Assistants
- Moveworks Partners with Microsoft to Deliver Secure, Scalable Generative AI Solutions to Customers
- Rockset Announces 2024 Index Conference, Industry Event for Engineers Building Search, Analytics, and AI Applications at Scale
- SAS Advances Industry Solutions with Packaged AI Models
- Altair Acquires Cambridge Semantics, Powering Next-Gen Enterprise Data Fabrics and GenAI
- SAS Adds to Its Trustworthy AI Offerings with Model Cards and AI Governance Services
- Fujitsu and Oracle Collaborate to Deliver Sovereign Cloud and AI Capabilities in Japan
- Kore.ai Introduces Experience Optimization Platform V11.0, Accelerating AI Deployment
- Volumez Expands Collaboration with AWS, Joins ISV Accelerate Program
- AI Squared Raises $13.8M to Accelerate Widespread AI Adoption Within Organizations
- Hazelcast Sets New Standards for AI Workloads with Platform 5.4 Enhancements
April 17, 2024
- Immuta Launches Domains Policy Enforcement Capability to Simplify Enterprise-wide Data Security and Governance
- ThoughtSpot Makes Embedding AI-Powered Analytics Easy and Ubiquitous for Everyone
- Cribl Ushers in a New Era of Data Storage Simplicity with Cribl Lake
- Neo4j Welcomes New GQL International Standard in Major Milestone for Database Industry
- General Assembly Report: Tech Firms Pay Top Dollar to Secure Competent AI Professionals
- Appen Named a Leader in Everest Group’s Data Annotation and Labeling Solutions for AI/ML PEAK Matrix Assessment 2024
- Loft Labs Raises $24M in Series A Funding to Enhance Multi-Cloud and AI Infrastructure Capabilities
- Hitachi Vantara Unveils Virtual Storage Platform One, Providing the Data Foundation for Unified Hybrid Cloud Storage
April 16, 2024
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
Call & Contact Center Expo
April 24 - April 25Las Vegas NV United States -
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States