Follow Datanami:
April 23, 2024

The Top Five Data Labeling Firms According to Everest Group

(Image courtesy Centific)

The process of annotating and labeling data is critical for supervised learning tasks, such as training a large language model (LLM) and other types of machine learning models. However, the need for human cognition and input is a limiting factor on the amount of data that can be prepared. As a result, there is considerable demand for software that can help streamline the data labeling and annotation workflow, as well as for third parties that can do the labeling work on an outsourcing contract. Everest Group recently ranked the top providers in these booming spaces.

Everest Group, a Dallas, Texas-based IT analyst firm, analyzed 19 software vendors and outsourcing providers in its Data Annotation and Labeling (DAL) Solutions for AI/ML PEAK Matrix Assessment 2024 report. According to Everest analysts, enterprises primarily value the speed at which DAL providers can deliver the goods as well as the resulting quality of the labeled or annotated data.

“They prioritize providers that emphasize relationship-building, cost-effectiveness, agility, and a steadfast commitment to deliver tangible business impact and RoI throughout their transformation journey,” the Everest analysts write in the report. “Equipped with trained workers and robust annotation platforms, these providers efficiently guide enterprises through the DAL landscape.”

(Image courtesy Everest Group)

The Matrix ranks each provider’s market impact against their vision and capability, and five providers made it to the peak and are considered leaders in the space. Here are the top five, according to their ranking.

1. Appen

Appen is the king of the mountain in the DAL space, according to Everest’s report, with superior ratings in both MATRIX axes. That’s not surprising, as the Sydney, Australia-based company has been at this game for nearly 30 years.

The company, which is publicly traded and reported $273 million in revenue last year, has developed a well-regarded DAL platform and has also established DAL outsourcing services with operations in the US, China, and the Philippines.

According to Appen, more than 50-million person hours have been spent on its DAL platform, and it has been used in more than 20,000 projects, encompassing 10 billion units of data. More than 80% of the leading LLM builders are Appen users, the company claims, and it has completed more than 100 million LLM data elements.

“Appen is dedicated to providing customers with high-quality, trustworthy data that power the world’s leading AI models at scale,” Appen CEO Ryan Kolln said in a press release. “With this new accolade, Appen is recognized as a cutting-edge market leader in the AI data space.”

2. TELUS International

TELUS offers a DAL platform (image courtesy TELUS International)

The second ranked DAL provider in Everest’s report is TELUS International, the Vancouver, Canada-based IT services giant. In addition to digital transformation and IT lifecycle services, TELUS also provides data annotation to companies around the world.

TELUS bolstered its data annotation business in 2021 with the acquisition of the AI division of Lionbridge. Today, TELUS offers a DAL platform that supports a broad range of data, including video, still images, text, sensor, audio, and geo.

In addition to software, TELUS offers an AI Community that is composed of more than 1 million annotation and labelers around the world. Its data services run the gamut from data collection and creation to annotation and validation.

3. Centific

Third place in Everest’s MATRIX goes to Centific, a Redmond, Washington-based company specializing in providing a range of services to facilitate AI, including data annotation and labeling.

Centific offers the services of its “domain-segmented annotation teams” that work with the company’s custom annotation platform. The company, which has operations in India and China, specializes in helping customers to prepare data in LLMs, computer vision, speech, search relevance, maps, augmented driving, and augmented reality/virtual reality (AR/VR).

In addition to data annotation, Centific has “decades of experience” working in data collection in the LLMs/NLP, computer vision, speech and AR/VR space. It also offers professional expertise in reinforcement learning from human feedback (RLHF), as well as AI red teaming to help tamp down on LLM hallucinations.

Finally, Centific is also a data vendor. The company says it has billions of off-the-shelf datasets available, ranging from call center audio and live meeting videos to optical character recognition (OCR) images and Korean phone calls.

4. (Tied) TaskUs

Tied for fourth place on the Everest DAL Solutions leader board is TaskUs, a business process outsourcing (BPO) and digital solutions provider based in New Braunfels, Texas.

Founded in 2008, TaskUs provides a range of BPO services, including call center operations and content moderation through its global workforce of 47,000 employees and gig contractors, many of whom are based in the Philippines. The company went public in 2021, and reported $924 million in revenues last year.

TaskUs also provides data labeling services for LLM, computer vision, video, and audio. The company claims to have more than 15 years of experience with data labeling via a workforce that has touched 100,000 domain experts in 30 languages.

The company touts a human-in-the-loop (RLHF) approach to developing AI models. In addition to collecting and labeling data, TaskUs can provide data science expertise, everything “from initial model training to continuous maintenance and optimization,” the company says.

4. (Tied) Akkodis

Also tied for fourth is Akkodis, a diverse engineering company based in Switzerland that provides a wide range of digital services to clients in automotive, aerospace, energy, banking, manufacturing, life sciences, healthcare, and IT.

Akkodis, which has €4 billion in annual revenue and employees more than 50,000 workers, touts solutions in big data, analytics, AI and ML, and robotic process automation. The company is also moving into generative AI and copilots.

While co-pilots and GenAI offer tremendous opportunity, the company says that “there is a lot more goes on beneath the surface, and realizing when it comes to AI, good data is 80% of the work.”

Akkodis ranked higher than TaskUs on the vision and capability axis, while TaskUs ranked about the same amount higher than Akkodis on the market impact axis, which makes them essentially tied.

Rest of the Field

Everest broke the rest of the field into two groups, including “major contenders” and “aspirants.”

The major contenders include iMerit of Kolkata, India, CloudFactory of Durham, North Carolina; NextWealth of London; Innodata of Hackensack, New Jersey; FiveS Digital of Udaipur, India; Sama of San Francisco, California; LXT.AI of Mississauga, Canada; Cogito Tech of Levittown, New York; and Clickworker of Essen, Germany.

In the aspirants division, Everest lists Digital Divide Data of New York City; Innominds of San Jose, California; Impact Enterprises of Houston, Texas; and DesiCrew of Chennai, India.

Related Items:

Better Machine Learning Demands Better Data Labeling

Data At More Than Half Of Companies Will Not Be AI-Ready By The End of 2024

OpenAI Outsourced Data Labeling to Kenyan Workers Earning Less than $2 Per Hour: TIME Report