Follow Datanami:
April 27, 2021

Toloka Expands Data Labeling Service

(Image courtesy Toloka)

Data labeler Toloka is heading to India to tap into the subcontinent’s vast pool of English-speaking workers. The firm, which was founded in Russia and is incorporated in Switzerland, says the expansion will grow its network of crowdsourced human labelers, currently at 9 million people.

Toloka helps organizations develop their AI offerings by providing them with labeled data on which to train machine learning algorithms. The company pays its “Tolokers” a fee based on the volume of images, videos, and unstructured text for which they provide labels and annotation.

“We’re excited to continue our push into India, as well as into the surrounding areas, including Pakistan, Myanmar, Bangladesh, and Indonesia,” Toloka CEO Olga Megorskaya said in a press release. “Our 300,000 Tolokers from this region have already proven extremely valuable to our customers, but a great deal of untapped potential remains in this market. We hope to see new talents from India and beyond on our platform soon, and look forward to setting new industry standards together.”

Data-labeling crowdsourcing platforms have become popular in recent years as organizations scramble to provide large amounts of labeled data for large neural networks. Unlike traditional machine learning algorithms, deep learning systems, such as those used for computer vision and textual processing workloads, require huge volumes of data. Some organizations have resorted to using synthetic data when a lack of real-world data is available, but this doesn’t work in every situation.

In the computer vision department, Toloka provides image classification, side-by-side comparisons, bounding box, polygons, keypoints, and image transcription labeling services. It charges customers $15 for 1,000 labeling tasks, with a turnaround time of three hours.

In the natural language processing (NLP) field, the company provides text recognition and classification, sentiment analysis, named-entity recognition, and search relevance. It also collects audio and provides transition and classification on audio data. It charges customers $18 for 1,000 tasks, with a turnaround time of four hours.

Toloka was founded in 2014 by Yandex, which is one of the largest technology firms in Russia, with $3 billion in revenue last year and more than 70 Internet offerings. Toloka says it has about 2,400 customers around the world. Its workers are fluent in 40 languages and are spread across 100 countries.

“Toloka” is a Russian word that refers to a form of mutual assistance among villagers of Russia, Ukraine, Belarus, and other Eastern European countries. According to the company, tolakas “were organized in villages to perform urgent work requiring large numbers of workers, such as harvesting, logging, and building houses.”

Related Items:

Synthetic Data: Sometimes Better Than the Real Thing

Faulty Data is Stalling AI Projects