Actian Reasserts Performance Claims With VectorH
The latest version of SQL-on-Hadoop specialist Actian Corp.’s Vector database tightens integration with Apache Spark to widen access to new data sources while adding enterprise features required to move Hadoop-based analytics to production.
The database specialist said Tuesday (June 28) version 5.0 of its Vector in Hadoop (VectorH) database uses the same query engine as its Vector platform. The company claims a Transaction Processing Performance Council (TPC-H) benchmark record for non-clustered systems at the 3,000 Gb scale for its Vector platform.
Actian, Palo Alto, Calif., said tighter integration of the enterprise SQL database sitting natively on Hadoop with the Spark query engine would expand data access while creating options for exploring machine learning techniques. “Tighter integration with Spark makes it easier for our customers to leverage data in different formats and from different sources,” Mark Milani, Actian’s senior vice president for product engineering, noted in a statement.
The VectorH 5.0 database, which Actian said would be generally available at the end of July, also integrates the latest Hadoop distributions from Cloudera, Hortonworks and MapR for deployment on premises or in the cloud.
Based on a TPC-H query set running on a 10-node cluster at the 1,000-Gb scale, the company claimed the latest version of VectorH outperformed other SQL-on-Hadoop platforms, including Apache HAWQ, Apache Hive, Apache Spark SQL and Cloudera Impala. According to the company, the query workload used in the benchmark testing was designed to be representative of a “medium complexity ad-hoc decision support workload.”
Actian claimed the benchmark results show “VectorH can run within seconds queries that take the SQL in Hadoop competition up to 20 minutes to run” even after competitors’ performance has been optimized.
Company engineers noted in a report that VectorH also integrates with YARN for workload management, thereby “achieving a high degree of elasticity.”
In a blog post, company engineers attributed the performance advantages to a range of hardware and software advances. On the hardware side, they cited multicore parallel and vectored execution that leveraged features in Intel Corp.’s (NASDAQ: INTC) CPU architecture, including the AVX2 vector instruction set along with large, multilayer caches.
Along with a “well-tuned query optimizer,” the researchers also noted effective I/O filtering and a lightweight data compression approach that achieved “faster vectorized execution by minimizing branches and instruction counts.”
Company engineers also touted SQL functionality and data update capabilities for the latest version of VectorH along with its file format that is said to deliver faster query performance and reduced storage requirements, the company claims.
Still, observers caution that benchmark test comparisons should be viewed skeptically since they are notoriously unreliable at predicting real-world use cases. Vendors also have been known to tweak their systems in order to game the benchmarking system. Recent TPC benchmark tests have been specifically designed to reduce these gambits while striving to make benchmark tests better reflect real-world conditions.