Follow Datanami:
October 16, 2020

Dataloop Drives Labeling Into the DataOps Pipeline

Data is the fuel for machine learning, but the data needs to be accurately labeled for the machines to learn. To that end, data training startup Dataloop yesterday unveiled that it’s received $11 million in Series A funding to build SaaS data pipelines that combine human supervision of the data annotation process, along with data management capabilities.

Today’s computer vision models are extremely powerful, and the ones based on deep learning approaches can exceed human capabilities. From self-driving cars navigating in the world to programs that can accurate diagnose diseases in MRI images, the potential uses for Ais built upon convolutional neural networks are astonishingly wide.

However, there’s a catch (there always is). The deep learning models work best when presented with lots of labeled data. However, because of the amount of data that deep learning uses, spending human cycles to curate all that data is extremely expensive, and in fact is one of the biggest bottlenecks preventing more widespread adoption of AI.

For example, a 2019 study by Dimensional Research concluded that “96% of companies surveyed stated they have run into training-related problems with data quality, labeling required to train the AI, and building model confidence.” That’s why 70% of the companies it surveyed relied on external firms to supply the data collection, labeling, and development services.

Dataloop provides SaaS tools for data labeling

That’s essentially the market that Dataloop is hoping to fill. Dataloop is an Israeli company that was founded in 2017 with a focus on automating the data annotation process, primarily for computer vision projects but also for ones involving audio files.

Dataloop has developed a SaaS application that helps companies automate this data labeling process, and functions as a hub for uniting data scientists, data engineers, and the data labelers themselves.

Humans are not required for all labeling activities. Other machine learning algorithms can sometimes provide the necessary level of accuracy in data labeling. Dataloop has what it dubs “AI-assisted auto-annotation capabilities” that are built into the offering for supplying the data to train downstream vision model.

However, AI cannot be fully trusted to accurately label the images, and that’s why Dataloop keeps humans in the loop: to oversee the AI processes and step in when needed. “We strongly believe that with humans in the loop, algorithms can make more accurate and reliable predictions, which ultimately leads to more accurate machine learning capabilities,” the company tells Datanami.

Dataloop recently came out of stealth, and this week announced an $11 million Series A round led by Amiti Ventures with participation from F2 Venture Capital, OurCrowd, NextLeap Ventures, and SeedIL Ventures. The company now has $16 million in total funding and is now targeting AI development teams in the US and Europe.

Dataloop helps labelers identify specific points in pictures

The firm finds itself competing with larger and more established firms in this space, including Appen, which acquired FigureEight last year for $175 million, as well as Alegion, which commissioned the 2019 Dimensional Insights report that Dataloop cited. The cloud giants are also getting into this game, as AWS is doing with Sagemaker.

Dataloop says it will differentiate itself in the market through the addition of data management capabilities—such as moving data, versioning data, and managing storage–in addition to its software that helps automate the annotation step. The company boasts a Python SDK that enables the data annotation capabilities to be woven into larger data pipelines and DevOps workflows. This gives customers a greater ability to customize how the data annotation functions fit into the overall picture, the company says.

The company is fairly new, but it’s already landed customers in several industries, including Standard’s check-out service’ Foresight Automotive’s object-detection system for self-driving cars; Descartes Labs’ geospatial analytics; and Transenterix’s surgical robotics system.

“Many organizations continue to struggle with moving their AI and ML projects into production as a result of data labeling limitations and a lack of real time validation that can only be achieved with human input into the system,” said Eran Shlomo, CEO of Dataloop. “With this investment we are committed, along with our partners, to overcoming these roadblocks and providing next generation data management tools that will transform the AI industry and meet the rising demand for innovation in global markets.”

Related Items:

Training Your AI With As Little Manually Labeled Data As Possible

Automatic Data Labeling Gains Momentum with New IBM and Labelbox Announcements

AWS Upgrades SageMaker Labeling Tool