March 2, 2015

Actian Claims ‘Permanent Performance Advantage’ with SQL-on-Hadoop Tool

Alex Woodie

The SQL-on-Hadoop sweepstakes are by no means over. What’s been dubbed the “gateway drug” for Hadoop is just starting to gain traction. But according to Actian, its SQL-on-Hadoop offering, dubbed Vortex, is out to an early–and permanent–lead in the performance department.

At the recent Strata + Hadoop World show, Actian pitted Vortex against Cloudera’s Impala right in the booth, where it largely re-created the results of a 2014 TPC Decision Support (TPC-DS) benchmark test that showed Vortex completing a job up to 30 times faster than Impala.

Such comparisons can be useful, but they should also be taken with a grain of salt. Benchmarks are notoriously poor at predicting real-world conditions, and vendors have been known to fiddle with their systems to fudge the results in their favor. While the TPC-DS benchmark was specifically designed to cut down on these types of shenanigans, the fact that nobody appears to be publicly sharing their TPC-DS benchmarks in an open way raises additional questions.

That’s not stopping Actian from talking about Vortex, which it says (unsurprisingly) is winning the SQL-on-Hadoop war. The company’s big data point man, CTO Mike Hoskins, shared his views on the state of Hadoop SQL at the recent show in San Jose, California.

“Impala is good,” he says. “But you just don’t write these things from scratch and run every query…It takes 10 years to write a really good, fully functioning, enterprise-class” SQL engine.

Vortex, you will remember, is a parallelized version of the Vector database that Actian developed for Hadoop. Previously called VectorWise, the column-oriented analytic database was originally developed by Peter Boncz a decade ago as part of the X100 project at a Dutch national research institute.

When it came out, VectorWise was one of a new class of massively parallel analytic databases developed specifically to provide a new level of performance on big data problems. Hoskins puts VectorWise in the same class as two other databases, including Mike Stonebraker’s Vertica database (now owned by Hewlett-Packard) and Barry Zane’s ParAccel database, which Actian also owns (Actian is also an investor in Zane’s latest startup, SPARQL City).

“These were the three shiny, new, all-columnar, all-analytic, software-only, scale-out-on-commodity-hardware databases,” Hoskins says. “They are fundamentally different than the rest of the databases, in my opinion. So we enjoy permanent performance advantages over that.”

The secret sauce that makes Vortex so darned fast, Hoskins says, is vector processing. “Slowly people are realizing that vector processing is a massive innovation that they have to have in the database,” he says.

Imapala is not the only competition Actian has for Vortex, of course. Hortonworks continues to work on Hive (which everybody seems to despise, Hoskins says) with its Stinger initiative, HP is shipping a version of Vertica for Hadoop, Pivotal’s HAWQ is soon-to-be in the open source realm, MapR has a play with Drill, and even IBM’s Big SQL gets play.

Hoskins says it could take 10 years for competitors to catch up to Vortex. “We see queries [from Tableau] that have 500 lines in them,” he says. “Try putting that in a query planner and optimizer and have it understand perfectly how to distribute a balanced, parallel workloads around an HDFS cluster. That’s non-trivial stuff, and we solved it already and brought it into Hadoop instead of writing it from scratch.”

Hoskins admits that Hadoop is bigger than just SQL. It is, after all, called Structured Query Language, which means it’s not great at crunching vast amounts of unstructured or semi-structured data. Actian offers other tools for hammering messy data down into more structured data, or for tackling “unknown unknown” types of problems, including graph analytics and its triple-store.

But once business data is in a relatively stable form, customers still want to use SQL to solve those “known unknown” types of problems. “Unstructured data is interesting and we handle that,” Hoskins says. “But there’s still an opportunity to take certain data sets that you’re trying to interrogate over and over with the lowest latencies in the world, and pay for that cost of loading them into a fixed schema SQL database, so you can get not only incredible response time and access.”

It’s all about making business analysts who are skilled in SQL and have domain expertise productive on Hadoop. “This is about addressing that shortage in the Hadoop world, where people are slinging Pig and MapReduce code, tragically,” Hoskins says. “What if you could bring a high-level, high-productivity, high-function language like SQL to the game? It could be very important.”

Cloudera Touts Near Linear Scalability with Impala

How Actian Plans to Take Over the Big Data World

Applications: Data Mining, Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services, Healthcare, Manufacturing, Retail

Vendors: actian, Cloudera, Hewlett-Packard, SPARQL City

Tags: actian, impala, paraccel, SQL on Hadoop, Vector, Vectorwise, vertica, Vortex

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Actian Claims ‘Permanent Performance Advantage’ with SQL-on-Hadoop Tool

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Actian Claims ‘Permanent Performance Advantage’ with SQL-on-Hadoop Tool

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link