Follow Datanami:
February 1, 2024

Voltron Aims to Unblock AI with GPU-Accelerated Data Processing


Data is at the heart of artificial intelligence, but it’s also emerging as one of its biggest bottlenecks. Without sufficient quantities of good, clean data to feed into models, companies simply can’t reap the rewards of AI. This situation has been recognized by the folks at Voltron Data, which recently launched a new distributed query engine designed to use GPUs to crank up the data processing volumes to feed AI demand. Voltron also acquired an AI company last week, furthering its AI aims.

“Companies at the forefront of AI are constrained by data processing,” Voltron Data said in its December 1 press release announcing Theseus, its new distributed processing engine. “ETL, feature engineering, and transformation are key parts of AI/ML. They cannot ramp up AI capabilities efficiently because they cannot afford to build out big data CPU clusters fast enough. The performance divergence between GPUs and CPUs is only growing; this problem is getting exponentially worse.”

This led the Mountain View, California company–which was founded in late 2021 by Wes McKinney, the creator of pandas and co-creator of Apache Arrow, and Josh Patterson, the former senior director of RAPIDS at Nvidia–to develop Theseus, which it claims is the first distributed data engine designed to run on accelerated hardware, including GPUs, as well as high bandwidth memory and accelerated networking and storage.

Theseus is an “embeddable engine” that runs on distributed systems equipped with it standard CPUs, such as x86 and ARM types, as well as accelerated hardware like Nvidia GPUs. Customers can plug into their existing data platforms via existing standards, such as Arrow, RAPICS, Ibis, Substrait, and Velox, and develop apps for Theseus using Python, R, Java, Rust, or C++.

Theseus can process data alongside other open source query engines that customers might be using, such as Apache Spark or Presto. However, thanks to its native support for GPUs, Theseus runs 45x faster than Spark, and costs 20x less, the company claims.

The goal is to leverage accelerated compute to crank through as much data as quickly as possible, without requiring expensive custom hardware or specialized setups. It’s about getting beyond “The Wall,” Voltron Data co-founder Josh Patterson said.

“AI systems are headed straight for The Wall–an inflection point where CPU-based big data systems reach peak performance and can no longer keep up with GPU-powered AI platforms,” Patterson said in a press release. “We won’t be able to keep up with AI demand at scale until data processing fundamentally changes. Data processing engines must leverage accelerated compute, memory, networking and storage. We are thrilled to introduce Theseus to the world – an engine that is built to leverage the latest hardware innovations and helps companies get over The Wall.”

This approach has its benefits, notes Hyoun Park, chief analyst of Amalgam Insights.

“In the Era of AI, enterprises face a proliferation of data sources, abstraction of coding languages and strategic needs for every employee to be more data-driven. At the same time, Spark has reached its limits as an analytic processing system for the generation of Big Data,” Park says in Voltron’s press release. “As the average enterprise now accesses over a thousand data sources, businesses must invest their data processing capabilities to support the next order of magnitude for analytics and AI demands. Voltron Data has taken an important step forward with this maiden voyage of Theseus to solve all of these data issues for the Era of AI.”

Josh Patterson, co-founder and CEO of Voltron Data (left) talks with Mohan Rajagopalan, VP & GM, HPE Ezmeral Software (Image courtesy Voltron Data)

The company is selling access to Theseus via a non-traditional “revenue share” model, whereby customers or partners embed the engine into their own systems. One of the first companies to take Voltron up on the offer is HPE, which is including Theseus as part of its Ezmeral Unified Analytics Software.

Mohan Rajagopalan, the vice president and general manager of HPE Ezmeral Software, says Theseus will improve the flow of data for AI, ML, and analytics workloads.

“With Theseus, Voltron Data’s composable query engine, enterprises can take full advantage of HPE Ezmeral Unified Analytics Software’s GPU-and-CPU optimized data lakehouse to turbo-charge data preparation, data processing and other traditionally CPU-based workloads,” Rajagopalan says in a press release.

Voltron made its own move into AI last week with the acquisition of Claypot, an AI startup developing software to deliver feature engineering and MLOps capabilities. The company was founded in 2022 by Chip Huyen, the author of the book “Designing Machine Learning Systems,” and Zhenzhong Xu, who led the streaming data platform team that serves more than 2,000 data use cases at Netflix.

“I couldn’t be more excited to bring on Chip Huyen, Zhenzhong Xu and the entire Claypot AI team,” Patterson says in a press release. “Together we’re going to be able to accelerate our real-time and MLOps product roadmap with state-of-the-art features for our customers.”

This was Voltron Data’s first acquisition. In February 2022, Voltron received $22 million in a seed round from BlackRock and Walden Catalyst, followed by an $88 million Series A round with Catalyst the same month.

Related Items:

Voltron Data Releases Enterprise Subscription for Arrow

Voltron Data Takes Flight to Unify Arrow Community

People to Watch 2018: Wes McKinney