Data.world Aims to Rethink Data Catalogs
What is a data catalog? If you answered that it’s simply an index that tells you where to find data, then Brett Hurt would like a word with you. As the co-founder and CEO of data.world, Hurt is looking to redefine what is a data catalog. And with a fresh $26 million raised in a round of funding announced today, he’s on his way to doing just that.
“We really want to redefine what a data catalog actually means,” Hurt tells Datanami in a Zoom call last week from his Austin, Texas home. “It’s one thing to just have a library of your data assets and your analytics. It’s a whole other thing to actually use the data.”
Data.world does provide an index to customers’ data, as all data catalogs do. But by building the catalog atop a knowledge graph and extending it with the ability to execute federated queries through hooks with popular BI tools, the data.world offering goes beyond what most people think a data catalog is.
According to Hurt, who is a prolific tech investor and also a co-founder of a company called Coremetrics that was acquired by IBM in 2010, these additional capabilities in the cloud-based data catalog allows customers to make greater use of their data.
“It [the data] is your most important asset. It’s your brain,” Hurt says. “We light up that brain by providing all these linkages powered by our knowledge graph…to disparate data silos, whether they’re in Excel spreadsheet or various data warehouses or databases, or whether in SaaS solutions where data has been sporadically spread throughout.”
The COVID-19 pandemic has helped companies realize how siloed their data actually is, Hurt says. But it’s also demonstrated that people are siloed too, and that’s why Hurt is so bullish about bringing these disparate users together under data.world’s user interface, which he proudly announces is “consumer-grade.”
“We’re the only [data catalog vendor] that has a consumer-grade UI,” he says. “That really matters when you’re trying to drive adoption across an entire company and many different personas and really drive a data-driven culture transformation.”
Data.world’s data catalog uses the concept of a “data project” where users can join together different data assets and define the analysis. Users can then share that project with others through a URL, which allows them to see what data assets were used and how the analysis was put together. They also can repeat the analysis on new data sets in the future, Hurt says.
“And it’s automatically connected to the tools that they use, whether that’s Tableau or PowerBI or Google Data Studio. It pulls it all together into one succinct interface,” he says. “The reality is a lot of the usage of data tends to be very stunted in a bunch of disconnected tools and we bring all that together and integrate with all of those and really make it much easier to use.”
We’re well into the era of self-service data analysis tools, and those tools have gotten very good. Hurt is not looking to compete with the likes of Tableau or Qlik or any of the other developers of BI and visualization tools. And while data and business analysts will be users of the tool, data.world is also targeting data scientists, according to Bryon Jacobs, the company’s CTO.
“At the end of the day, you can treat your data.world catalog like a big relational database that actually has all the tables that are in all the databases you’ve catalog through a common SQL dialect, and you can really just targret your queries there and so the SQL drives and powers that,” Jacobs explains.
“Similarity, data scientists generally are not using a SQL-based tool like Tableau. They’re maybe using us from Python in Juptyer notebooks or R in RStudio,” he continues. “So for both those languages, we have an SDK that wraps our API that allows us to treat any data set on data.world as a dataframe in your Python notebook, so you can now use your standard Python tools over that code.”
Customers don’t have to use these advanced features, of course. In fact, most customers don’t even realize that the data in data.world is stored in the RDF format and queryable using SPARQL (a SQL-to-SPARQL transpiler that allows SQL queries to execute on the data.world RDF backend was the first of the company’s 16 patents approved by the US Patent Office, Hurt says). But that level of integration exists in the product, when customers are ready to adopt it, Jacobs says.
“Part of the long-term arc for most customers is moving through that realm where first I’m just cataloging my metadata and making sense of it,” the CTO says. “Next, I’m using that as a roadmap to dig into the data itself and make sense of that, to use that to drive BI and other analytical functions. The world where we really want to have is a highly articulated knowledge graph that’s in your particular business domain. We do have a number of customers who are at that peak of enlightenment, where they’re actually realizing that goal.”
According to Hurt, the company has been exceeding expectations in the customer acquisition department, which he partly attributes to the acceleration in data modernization (and migration to the cloud) as a result of the COVID-19 pandemic. Hurt isn’t disclosing the company’s customer count, but he assures us that he has many Fortune 500 companies working with the product.
The company has also been quite active on the community front. In fact, two-thirds of the Fortune 500 are participating in data.world’s community data efforts, which include a knowledge graph dedicated to sharing COVID-19 data. The company’s software was also used by the Associated Press data team that won a 2019 AP Chairman’s Prize for its use of data in reporting on world events.
Hurt says he plans to expand the company’s marketing efforts with the fourth round of funding, which netted $26 million and was led by Tech Pioneers Fund. The company also announced that Scott Booth, chairman of Tech Pioneers Fund, and Sally Jenkins, the current CMO of Elastic and former CMO of Informatica, has joined data.world’s board of directors. The company has raised a total of $71.3 million to date.