Acryl Data, the company driving the open source data catalog called the DataHub Project, got a boost in its quest to unify the fragmented data stack today when it announced the completion of a $21 million Series A round. The firm also unveiled a new observability solution that’s integrated with its data catalog and governance offerings.
Acryl Data was co-founded in 2020 by Silicon Valley software engineers who were frustrated with the fragmentation of the big data stack and wanted to unify the specialized toolchains used to manage data residing across disparate systems, from databases and streaming data frameworks to feature stores and machine learning models.
While the tech giants addressed the fragmentation of the data stack through sheer manpower and engineering expertise, smaller organizations are ill-equipped to deal with it.
Acryl Data CEO and co-founder Swaroop Jagadish helped developed a unified data management system while at Airbnb, where he was the head of data platform and search infrastructure from 2014 to 2020. Similarly, as the overall architect for big data at LinkedIn from 2010 to 2020, Acryl Data CTO and co-founder Shirshanka Das led the development of various data management tools, including the data catalog that would become the DataHub Project. The company’s third co-founder is Product Lead John Joyce, who was a senior software engineer at LinkedIn through 2020.
DataHub’s origins at LinkedIn go back to 2016, when Das and other members of the company’s data team created a metadata-driven data catalog called WhereHows. The product was designed to be a centralized place where data scientists could go to get data. They built and established connectors to pull metadata from a variety of data sources, including Apache Hive, Google Big Query, Apache Kafka, Apache Airflow, MySQL, SQL Server, Postgres, Snowflake, and others.
LinkedIn developed DataHub to be a metadata-driven data catalog
In 2019, the team rearchitected WhereHows with a new push-driven metadata workflow (which provides more timely updates of changed metadata than pull-based approaches among other benefits), along with other improvements based on learnings gleaned from other tech giant’s data catalog creations, including Airbnb’s Dataportal, Uber’s Databook, Netflix’s Metacat, Lyft’s Amundsen, and Google Cloud’s Data Catalog.
The following year, the revamped product, dubbed DataHub, was subsequently released by LinkedIn under a permissive Apache 2.0 license. Das joined Jagadish and Joyce in co-founding Acryl Data later that year.
Usage of DataHub has taken off over the past three years. According to Acryl Data, which leads development of the open source project, there are more than 7,500 data practitioners using it across more than 1,000 enterprise deployments, including at companies like Stripe, Pinterest, and Optum. The project currently has more than 350 contributors, according to Acryl Data.
In addition to leading development of DataHub, Acryl Data develops its own cloud-based solution that’s based on DataHub. Acryl Cloud, as the product is known, includes all of the data catalog functions in DataHub, plus some additional features in the areas of data discovery, data governance, and data observability.
The company describes Acryl Cloud as control plane where users can not only search for data in the catalog, but also view data lineage information and see where changes in the data flow or transformations could impact reliability (functions that are enabled by DataHub). The Acryl Cloud also features built-in governance functions to ensure that data follows certain rules.
Today it announced the data observability module of Acryl Cloud. Acryl Observe, as it’s called, uses anomaly detection functionality to automatically monitor the health of data as it sits in databases or is moving in pipelines. The software is currently in beta and expected to be made generally available later this year.
Data observability should be integrated with data catalog and data governance functions, Das says.
“The explosion of data over the past decade has driven both opportunities and challenges for businesses,” he says in a press release. “Companies embraced various technologies and platforms to store, process, and analyze their data–but now find themselves facing a fragmented data landscape. Siloed datasets, disparate tools, and a lack of unified control has created inefficiencies and hindered collaboration between technical and business teams.
“Further, the industry has addressed fragmentation with more fragmentation, as governance and observability tools have been unnecessarily separated, leading to inefficient and inconsistent solutions,” he continues. “Through the industry’s first open control plane for data, we have designed Acryl Cloud to take these problems head-on.”
Acryl Data co-founders Swaroop Jagadish (CEO, left) and Shirshanka Das (CTO, right)
This vision has been bolstered with $21 million in Series A money. The venture capital round was led by Bhaskar Ghosh, a partner at 8VC, with participation from Ram Shriram at Sherpalo Ventures and Guillermo Rauch, the founder and CEO of Vercel. The company says it will use the money to “deepen investments in customer success, accelerate development of its cloud offering, and extend the vision of its product towards a control plane for data.”
The venture capitalists see real potential in Acryl Data and its quest to unify elements of the data stack.
“8VC and Acryl share a vision for the future, in which enterprises can consolidate control of each of the various pieces of their data stack within a single platform, reducing complexity and accelerating innovation,” Ghosh says in the press release. “Shirshanka, Swaroop, and team are ideally positioned to harness the power of DataHub and solve the hard problems that have historically stood in the way of this vision.”
“The growth of data-driven decision making has led to an increasingly fragmented data ecosystem,” Shriram says in the press release. “We’re excited to join Acryl in its journey to help organizations manage this new decentralized environment, bringing data teams closer to the business and enabling even casual users to interact directly with data insights. Shirshanka and Swaroop have been at the forefront of these efforts from the beginning, and are well-situated to overcome many of the obstacles that exist in the industry.”
“There is a compelling wave of data innovations on the horizon, from data contracts and distributed data mesh architectures to the broad changes that machine learning and AI promise to deliver,” Rauch said. “The question for many organizations is how to wade into these ventures safely, and I believe Acryl can be the control plane to bring these revolutions into reality.”
What to Look for in a Data Catalog
Data Catalogs Take Center Stage in Eckerson CDO TechVent
What Does It Mean for a Data Catalog to Be Powered by a Knowledge Graph?