February 1, 2024

Voltron Aims to Unblock AI with GPU-Accelerated Data Processing

Alex Woodie

(Gorodenkoff/Shutterstock)

Data is at the heart of artificial intelligence, but it’s also emerging as one of its biggest bottlenecks. Without sufficient quantities of good, clean data to feed into models, companies simply can’t reap the rewards of AI. This situation has been recognized by the folks at Voltron Data, which recently launched a new distributed query engine designed to use GPUs to crank up the data processing volumes to feed AI demand. Voltron also acquired an AI company last week, furthering its AI aims.

“Companies at the forefront of AI are constrained by data processing,” Voltron Data said in its December 1 press release announcing Theseus, its new distributed processing engine. “ETL, feature engineering, and transformation are key parts of AI/ML. They cannot ramp up AI capabilities efficiently because they cannot afford to build out big data CPU clusters fast enough. The performance divergence between GPUs and CPUs is only growing; this problem is getting exponentially worse.”

This led the Mountain View, California company–which was founded in late 2021 by Wes McKinney, the creator of pandas and co-creator of Apache Arrow, and Josh Patterson, the former senior director of RAPIDS at Nvidia–to develop Theseus, which it claims is the first distributed data engine designed to run on accelerated hardware, including GPUs, as well as high bandwidth memory and accelerated networking and storage.

Theseus is an “embeddable engine” that runs on distributed systems equipped with it standard CPUs, such as x86 and ARM types, as well as accelerated hardware like Nvidia GPUs. Customers can plug into their existing data platforms via existing standards, such as Arrow, RAPICS, Ibis, Substrait, and Velox, and develop apps for Theseus using Python, R, Java, Rust, or C++.

Theseus can process data alongside other open source query engines that customers might be using, such as Apache Spark or Presto. However, thanks to its native support for GPUs, Theseus runs 45x faster than Spark, and costs 20x less, the company claims.

The goal is to leverage accelerated compute to crank through as much data as quickly as possible, without requiring expensive custom hardware or specialized setups. It’s about getting beyond “The Wall,” Voltron Data co-founder Josh Patterson said.

“AI systems are headed straight for The Wall–an inflection point where CPU-based big data systems reach peak performance and can no longer keep up with GPU-powered AI platforms,” Patterson said in a press release. “We won’t be able to keep up with AI demand at scale until data processing fundamentally changes. Data processing engines must leverage accelerated compute, memory, networking and storage. We are thrilled to introduce Theseus to the world – an engine that is built to leverage the latest hardware innovations and helps companies get over The Wall.”

This approach has its benefits, notes Hyoun Park, chief analyst of Amalgam Insights.

“In the Era of AI, enterprises face a proliferation of data sources, abstraction of coding languages and strategic needs for every employee to be more data-driven. At the same time, Spark has reached its limits as an analytic processing system for the generation of Big Data,” Park says in Voltron’s press release. “As the average enterprise now accesses over a thousand data sources, businesses must invest their data processing capabilities to support the next order of magnitude for analytics and AI demands. Voltron Data has taken an important step forward with this maiden voyage of Theseus to solve all of these data issues for the Era of AI.”

Josh Patterson, co-founder and CEO of Voltron Data (left) talks with Mohan Rajagopalan, VP & GM, HPE Ezmeral Software (Image courtesy Voltron Data)

The company is selling access to Theseus via a non-traditional “revenue share” model, whereby customers or partners embed the engine into their own systems. One of the first companies to take Voltron up on the offer is HPE, which is including Theseus as part of its Ezmeral Unified Analytics Software.

Mohan Rajagopalan, the vice president and general manager of HPE Ezmeral Software, says Theseus will improve the flow of data for AI, ML, and analytics workloads.

“With Theseus, Voltron Data’s composable query engine, enterprises can take full advantage of HPE Ezmeral Unified Analytics Software’s GPU-and-CPU optimized data lakehouse to turbo-charge data preparation, data processing and other traditionally CPU-based workloads,” Rajagopalan says in a press release.

Voltron made its own move into AI last week with the acquisition of Claypot, an AI startup developing software to deliver feature engineering and MLOps capabilities. The company was founded in 2022 by Chip Huyen, the author of the book “Designing Machine Learning Systems,” and Zhenzhong Xu, who led the streaming data platform team that serves more than 2,000 data use cases at Netflix.

“I couldn’t be more excited to bring on Chip Huyen, Zhenzhong Xu and the entire Claypot AI team,” Patterson says in a press release. “Together we’re going to be able to accelerate our real-time and MLOps product roadmap with state-of-the-art features for our customers.”

This was Voltron Data’s first acquisition. In February 2022, Voltron received $22 million in a seed round from BlackRock and Walden Catalyst, followed by an $88 million Series A round with Catalyst the same month.

Voltron Data Takes Flight to Unify Arrow Community

People to Watch 2018: Wes McKinney

Applications: Artificial Intelligence, Data Management

Technologies: Cloud, Frameworks

Sectors: Financial Services, Government

Vendors: HPE, NVIDIA, Voltron Data

Tags: AI, Apache Arrow, apache spark, ARM, CPU, data prep, distributed processing, distributed query engine, ETL, GPU, GPU-accelerated computing, Josh Patterson, Mohan Rajagopalan, query engine, Theseus

Voltron Aims to Unblock AI with GPU-Accelerated Data Processing

May 16, 2024

May 15, 2024

May 14, 2024

May 13, 2024

May 10, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Voltron Aims to Unblock AI with GPU-Accelerated Data Processing

May 16, 2024

May 15, 2024

May 14, 2024

May 13, 2024

May 10, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link