New C++ Acceleration Library Velox Juices Code Execution Up To 8x
Momentum is building around Velox, a new C++ acceleration library that can deliver a 2x to 8x speedup for computational engines like Presto, Spark, and PyTorch, and likely others in the future. The open source technology was originally developed by Meta, which today submitted a paper on Velox to the International Conference on Very Large Data Bases (VLDB) taking place in Australia.
Meta developed Velox to standardize the computational engines that underly some of its data management systems. Instead of developing new engines for each new transaction processing, OLAP, stream processing, or machine learning endeavor–which require extensive resources to maintain, evolve, and optimize–Velox can cut through that complexity by providing a single system, which simplifies maintenance and provides a more consistent experience to data uses, Meta says.
“Velox provides reusable, extensible, high-performance, and dialect-agnostic data processing components for building execution engines, and enhancing data management systems,” Facebook engineer Pedro Pedreira, the principal behind Velox, wrote in the introduction for the Velox paper submitted today at the VLDB conference. “The library heavily relies on vectorization and adaptivity, and is designed from the ground up to support efficient computation over complex data types due to their ubiquity in modern workloads.”
Based on its own success with Velox, Meta brought other companies, including Ahana, Voltron Data, and ByteDance, to assist with the software’s development. Intel is also involved, as Velox is designed to run on X86 systems.
The hope is that, as more data companies and professionals learn about Velox and join the community, that Velox will eventually become a regular component in the big data stack, says Ahana CEO Stephen Mih.
“Velox is a major way to improve your efficiency and your performance,” Mih says. “There will be more compute engines that start using it….We’re looking to draw more database developers to this product. The more we can improve this, the more it lifts the whole industry.”
Mih shared some TPC-H benchmark figures that show the type of performance boost users can expect from Velox. When Velox replaced a Java library for specific queries, the wall clock time was reduced anywhere from 2x to 8x, while the CPU time dropped between 2x and 6x.
They key advantage that Velox brings is vectorized code execution, which is the ability to process more pieces of code in parallel. Java does not support vectorization, whereas C++ does, which makes many Java-based products potential candidates for Velox.
Mih compared Velox to what Databricks has done with Photon, which is a C++ optimization layer developed to speed Spark SQL processing. However, unlike Photon, Velox is open source, which he says will boost adoption.
“Usually, you don’t get this type of technology in open source, and it’s never been reusable,” Mih tells Datanami. “So this can be composed behind database management systems that have to rebuild this all the time.”
Over time, Velox could be adapted to run with more data computation engines, which will not only improve performance and usability, but lower maintenance costs, writes Pedreira and two other Facebook engineers, Masha Basmanova and Orri Erling, in a blog post today.
“Velox unifies the common data-intensive components of data computation engines while still being extensible and adaptable to different computation engines,” the authors write. “It democratizes optimizations that were previously implemented only in individual engines, providing a framework in which consistent semantics can be implemented. This reduces work duplication, promotes reusability, and improves overall efficiency and consistency.”
Velox uses Apache Arrow, the in-memory columnar data format designed to enhance and speed up the sharing of data among different execution engines. Wes McKinney, the CTO and co-founder of Voltron Data and the creator of Apache Arrow, is also committed to working with Meta and the Velox and Arrow communities.
“Velox is a C++ vectorized database acceleration library providing optimized columnar processing, decoupling SQL or data frame front end, query optimizer, or storage backend,” McKinney wrote in a blog post today. “Velox has been designed to integrate with Arrow-based systems. “Through our collaboration, we intend to improve interoperability while refining the overall developer experience and usability, particularly support for Python development.”
These are still early days for Velox, and it’s likely that more vendors and professionals will join the group. Governance and transparency are important aspects to any open source project, according to Mih. While Velox is licensed with an Apache 2.0 license, it has not yet selected an open source foundation to oversee its work, Mih says.
Editor’s note: This article has been corrected. Wes McKinney is the CTO and co-founder of Voltron Data, not the CEO. Datanami regrets the error.
September 26, 2022
- Qlik Expands Strategic Alignment with Databricks Through SQL-Based Ingestion
- AppTek and expert.ai Announce Strategic Partnership
September 23, 2022
- Fluree and ZettaLabs Announce Merger to Serve Enterprise Data Infrastructure Modernization
- AI Skills Crisis May Lead to Wasted Investments and Stifled Innovation: SAS
September 22, 2022
- Quest Software Announces Public Beta for SharePlex for PostgreSQL
- Netskope Chooses YugabyteDB to Scale SASE Capabilities
- Alation and Fivetran Announce Strategic Partnership for the Modern Data Stack
- Cohere For AI Announces ML Scholars Program
- Kyndryl and Elastic Announce Expanded Partnership
- Dataminr Announces Availability of Dataminr Pulse in AWS Marketplace and Joins AWS ISV Accelerate Program
- Spectra Logic and iRODS Consortium Partner to Provide Glacier-Like Tier of Storage
- BigID Announces Native Data Security Controls for Snowflake
- MinIO Surpasses 1 Billion Cumulative Docker Downloads
- IntelliProp Unveils Its Omega Memory Fabric Chips
- SES, ESA and European Commission Partnering to Deliver Satellite Quantum Cryptography System
- Falkonry Unveils GPU-Based, High-Speed Time Series Anomaly Detection Application
September 21, 2022
- Coforge and Databricks Announce Partnership
- New Study Develops a Lossless Data Management Platform for ML and Sharing Experimental Information
- French Medical Lab Selects VAST’s Universal Storage Data Platform for Genome Analysis
- Synthetic Speech Startup Murf AI Raises $10M in Series A
Most Read Features
- Data Mesh Vs. Data Fabric: Understanding the Differences
- Walmart Gives Data and Analytics Monetization A Try
- Teradata Unveils New Data Lake, Advanced Analytics Offerings
- Big Data File Formats Demystified
- How Providence Overcame Security Obstacles to Unlock Medical Data in the Cloud
- Five Common AI/ML Project Mistakes
- How the Coronavirus Response Is Aided by Analytics
- Who’s Winning In the $17B AIOps and Observability Market
- Cloud Migrations Negatively Impacting Data Estates, Capital One Says
- Five Real-World Applications for Sports Analytics
- More Features…
Most Read News In Brief
- Mathematica Helps Crack Zodiac Killer’s Code
- MIT and Databricks Report Finds Data Management Key to Scaling AI
- Big Growth Forecasted for Big Data
- LinkedIn Donates Feature Store to Linux Foundation
- New C++ Acceleration Library Velox Juices Code Execution Up To 8x
- Nvidia Could Become Google of the Metaverse with AI Search Engine
- Big Data Career Notes: September 2022 Edition
- Pecan AI Leaps Over the Skills Gap to Enable Data Science On Demand
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- These 15 Data Fabrics Made the Cut in Forrester’s Wave
- More News In Brief…
Most Read This Just In
- SingleStore Selects AppDirect to Power Real-Time Marketplace
- Salesforce and AWS Announce Expansion of Strategic Partnership
- Zendesk Launches New Customer Sentiment and Intent Functionality Powered by ML
- Dell Technologies Establishes Partnership with Deep North
- Nvidia and Dell Technologies Announce New Data Center Solution
- Starburst Highlights Q2 Company and Customer Growth
- TDengine Releases TDengine Cloud
- Netskope Chooses YugabyteDB to Scale SASE Capabilities
- Gartner Recognizes Fivetran for Highest Growth in Data Integration Sector
- Intel Announces Availability of 12th Gen Intel Core SoC Processors for IoT Edge
- More This Just In…