Follow Datanami:
September 25, 2013

Hadapt Aims at Untangling ETL with Schemaless SQL

Isaac Lopez

Data analytics company, Hadapt, known for its data platform which merges SQL directly into Apache Hadoop, introduced a new “Schemaless SQL” addition to their technology portfolio which they say can be used to alleviate complicated ETL processes familiar with Hadoop.

According to Hadapt, the new technology enables business analysts the ability to query on traditional structured data as well as non-relational data such as text, document, and key-value pair data in one unified interface. Comprising their new “Schemaless SQL” technology are what Hadapt calls the “Hadapt Flexible Schema,” and “Multi-Structured Tables,” which taken together aim to unify querying of disparate data sources while reducing complexity involved with data analytics.

“As Hadoop continues its ascent into enterprise applications, traditional analytic boundaries and data processing methodologies are rapidly becoming obsolete,” said Hadapt in a statement. “ETL has long been a constraining aspect of the data analytics pipeline, often limiting the types of data and questions an analyst could ask. Schemaless SQL alleviates these constraints, enabling the data to ‘speak for itself,’ dynamically extracting hidden structure and presenting it to the end user.”

The new technology may be on the forefront of an evolution of the traditional Hadoop “schema-on-read,” towards what analyst Curt Monash refers to as “schema-on-need.”

“For years people have been putting data into DBMS (usually but not exclusively relational ones), building some indexes immediately, then adding more indexes to improve performance later as requirements are discovered,” said Monash in a recent article. “Schema-on-need is a continuation of the same idea, but targeted at poly-structured data,” (referring to data structures that are subject to change).

“A key concept in building the ‘Analytic DBMS for Hadoop’ was to provide the interface and performance of a relational database while maintaining the scalability and flexibility of Hadoop,” said Hadapt CEO Justin Borgman. “With this release, Hadapt enables Schemaless SQL over non-relational data stores, providing analysts with the only SQL-based solution for querying multi-structured data.”

While the technological implications of the new technology are intriguing, the effect of bringing the Hadoop fire down from Mount Olympus is noteable, as companies like Hadapt aim to eliminate the complexities that are famous with the data elephant. One of the major challenges as companies move to adopt big data technologies such as Hadoop is the lack of human resources able to wrestle with the framework and tame the big data beast.

Efforts like this one demonstrate the progress being made towards mainstreaming Hadoop, and making it accessible to a wider audience.

Related items:

MapR Gooses HBase Performance in Pursuit of Lightweight OLTP

Oracle Addresses Hadoop Security with Big Data Appliance

HP Says Big HAVEn Push Is Working