Meet Fangjin Yang, a 2023 Datanami Person to Watch
A new category of analytics database has emerged that can handle massive data inflows and deliver subsecond latency on a large number of simultaneous queries. One of those real-time databases is Apache Druid, which was co-developed by former Metamarkets engineer Fangjin Yang, who is one of our People to Watch for 2023.
Datanami recently caught up with Yang, who is also the CEO and co-founder of Druid developer Imply, to discuss real-time analytics database and the success of Apache Druid.
Datanami: What spurred you to create Apache Druid? Why couldn’t existing databases solve the needs you had at Metamarkets?
Fangjin Yang: Back in 2011, we were trying to quickly aggregate and query real-time data coming from website users across the Internet to analyze digital advertising auctions. This involved large data sets with millions to billions of rows. While we weren’t intending to build a new database for this, we tried building the application with several relational and NoSQL databases, but none were able to support the performance and scale requirements for rapid interactive queries on this high dimensional and high cardinality data.
Datanami: What is the key attribute that has made Druid so successful?
Yang: The key to Druid’s performance at scale is “don’t do it.” It means minimizing the work the computer has to do. Druid doesn’t load data from disk to memory, or from memory to CPU, when it isn’t needed for a query. It doesn’t decode data when it can operate directly on encoded data. It doesn’t read the full dataset when it can read a smaller index. It doesn’t send data unnecessarily across process boundaries or from server to server.
With this philosophy of “don’t do it,” you end up having an architecture that’s incredibly efficient at processing queries at scale and under load. And it’s why Druid can be so fast and deliver aggregations on trillions of rows at thousands of queries per second in sub-second.
Datanami: How do you see the market for big and fast analytics platforms evolving in 2023? Do you think we’ll continue to see the introduction of novel database engines?
We see an emergence of a new category of data infrastructure – real-time analytics databases – to address the growing demand of developer-built analytics applications built on real-time, streaming data. The need for faster query performance at scale isn’t slowing down. It’s become a game-changer as it unlocks new operational workflows for so many Druid users like Confluent, Netflix, and Salesforce. Will there be more database engines emerging over time? For sure, developers are constantly innovating and driving new workload requirements that need databases built-for-purpose.
Datanami: Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
Yang: I used to play video games semi-professionally, and am still an avid eSports fan.
You can read all of the interviews with the 2023 Datanami People to Watch at this link.