Automated Data Labeler Snorkel AI Lands $85M in VC Funding
For many enterprises using AI, one of the biggest bottlenecks continues to be the difficulty in getting all their critical data into clearly sorted and labeled groups so it can be used to drive business value.
Much of that data still must be manually sorted by hand into groups before being used, which takes huge amounts of time and resources and causes projects to take much longer to complete.
Snorkel AI wants to change that using its data-centric AI platform, Snorkel Flow, which helps data scientists and non-technical experts greatly reduce the time spent on AI modeling by automating data labeling and groupings using the power of AI.
To keep this work move forward, Snorkel AI announced the receipt of a new $85 million Series C funding round on Aug. 9 (Monday) that will be used to further grow its engineering and sales teams, while bringing more improvements to its platform.
Alex Ratner , the co-founder and CEO of Snorkel AI, told EnterpriseAI that the idea for the company’s technology began in 2015 in a research center at the Stanford AI Lab, where the Snorkel AI team was later spun out. Every day, he said, individual scientists, doctors and others came into their offices and lamented that they were frustrated by a shared problem – that it was just too hard and taking too much time to get all their data labeled before they could even use it for their work. The tasks were taking weeks or months, rather hours or days, which is what they needed to be able to push their projects forward.
“They were all getting stuck in the same thing,” said Ratner. “They were having an easier time than ever on the models and the infrastructure … becoming more commoditized, but they were all getting stuck in the data.”
The problems were legion, such as how could they more quickly label 100,000 medical images that were needed to get started with AI, he said. “I think we saw this problem rearing its head a bit earlier than other folks. We had a couple years to work on how to do data-centric AI, where this data is the first and foremost focus and [make it] so it is actually practical.”
Using Snorkel Flow, instead of it taking months to label the 100,000 medical images, a user can sit down with the platform for a few hours and iteratively show some of the desired patterns to the model, which then learns to extrapolate what it needs from the data, said Ratner. “The challenge of unstructured data is how variegated it can be, but the key is that you can show it some examples in Snorkel Flow and train a model to generalize it.”
This can be done using dozens of data pieces rather than thousands using Snorkel’s technology over several sessions, he added.
“Snorkel Flow offers a more automated, programmatic way of actually doing that,” he said. “The users put in rough heuristics or rules or other signals to actually drive this process rather than having to individually label every single contract for weeks or months to be able to teach the AI. Humans are still in the loop in our conception of our practical AI development workflow. The humans are just giving things like rules or patterns, rather than individual examples.”
For example, many enterprises need to classify complex documents or extract information from such documents. Using Snorkel Flow, a legal analyst at a bank can look for a key phrase in titles or a phrase before a clause that they are trying to extract, rather than conduct such searches manually over weeks or months, said Ratner. Inquiries can be done using any data types, from photos to text to PDFs, HTML and more.
The platform shortens the development cycle and improves application quality, while also helping to manage AI data bias.
One customer, a top-three U.S. bank, said it used Snorkel Flow to develop a contract processing application in less than 24 hours that produced training accuracy of more than 99 percent, according to Snorkel AI. In another case study, a large biotech firm said it saved about $10 million on unstructured data extraction, achieving 99.1 percent accuracy with Snorkel Flow.
Snorkel AI’s latest funding round was co-led by new investor, Addition, as well as from funds and accounts managed by BlackRock, a previous investor, the company said. Also participating were other previous investors, including Greylock, GV, Lightspeed Venture Partners, Nepenthe Capital and Walden. The company, which was founded in 2019, has now received a total of $135 million in funding.
Sumit Agarwal, an analyst with Gartner, told EnterpriseAI that data labeling is foundational to the development of AI workflows, which are dependent on large volumes of labeled data. “Labeling is often a manual, monotonous effort involving crowdsourcing or outsourcing,” said Agarwal. “Snorkel AI’s solution makes this task much more manageable within an organization. The ability to start with a small volume of manually-labeled data using AI to label larger datasets is very powerful.”
Soyeb Barot, another Gartner analyst who covers data analytics and AI, said that Snorkel AI’s method combines the two [most common] approaches found in data labeling – human-in-the-loop annotation and algorithms that do a lot of the auto-labeling and learn from the human annotation over time.
“Snorkel brings these two services together and, most importantly as a SaaS solution that can be implemented in-house,” said Barot. “This is critical since many organizations are not comfortable moving data to public cloud service providers and or leveraging human-workforce outside of their organization for privacy.”
For customers, their data is the critical element in making it all work well, said Barot. “Coming to the secret sauce, it is usually not the algorithm – Netflix openly shares algorithms it uses for personalization – it is the data that helps build good models for auto-labeling. Domain-specific, curated data play a major role in how good a model/algorithm you can build.”
Another analyst, Kevin Petrie of Eckerson Group, said Snorkel AI “targets a key point of pain for enterprise AI teams that need to train supervised machine learning models. They need to label historical outcomes within large datasets, which requires the domain expertise of business owners. By programming this labeling process, data scientists can label more data but take less of business owners’ time. This helps create accurate ML models more efficiently.”
The company competes with several other vendors in the machine learning lifecycle space, said Petrie. “However, Snorkel differentiates itself with the programmatic approach to labeling,” he added.