February 20, 2013

Cascading into Hadoop with SQL

Nicole Hemsoth

Today Concurrent, the company behind the Cascading Hadoop abstraction framework, announced a new trick to help developers tame the elephant.

The company, which is focused on simplifying Hadoop, has introduced a SQL parser that sits on top of Cascading with a JDBC Interface. Concurrent says that they’ll be pushing out over the next couple of weeks with hopes that developers will take it under their wing and support the project.

According to the company’s CTO and founder, Chris Wensel, the goal is to get the commuity to rally around a new way to let non-programmers make use of data that’s locked in Hadoop clusters and let them more easily move applications onto Hadoop clusters.

The newly-announced approach to extending the abstraction is called Lingual, which is aimed at putting Hadoop within closer sights for those familiar with SQL, JDBC and traditional BI tools. It provides what the company calls “true SQL for Cascading and Hadoop” to enable easier creation and running of applications on Hadoop and again, to tap into that growing pool of Hadoop-seekers who lack the expertise to back mission-critical apps on the platform.

Wensel says that Lingual’s goal is to provide an ANSI-standard SQL interface that is designed to play well with all of the big name distros running on site or in cloud environments. This will allow a “cut and paste” capability for existing ANSI SQL code from traditional data warehouses so users can access data that’s locked away on a Hadoop cluster. It’s also possible to query and export data from Hadoop right into a wide range of BI tools.

What it’s not going to provide is sub-second response times on a petabyte of data on a Hadoop cluster, that’s what users tap into Greenplum or Aster Teradata are often looked to for. Rather, the company’s goal is to provide the ability to easily move applications onto Hadoop—the challenge there is really around moving from a relational or MPP database over to Hadoop.

As Wensel noted, “Concurrent was established with the belief that there had to be a simpler path to mass Hadoop adoption. And since day one, we have worked to create solutions that make it easier for developers to build powerful and robust Big Data application, quickly and easily. With the Lingual project, we are one huge step closer to realizing our mission.”

To refresh, Cascading is a Java-based application framework that aids in the development of analytics and data management applications on Hadoop. It’s found its way into several big name production environments that have deployed Hadoop, mostly because the framework can address some of the complexities of developing MapReduce applications. Wensel believes that this new addition to the ranks of capabilities will let new users explore more robust uses of Hadoop.

One of the meatier examples of Cascading in action is at Twitter, where terabyte and petabyte-scale problems are being tackled on one of the larger production Hadoop clusters. The Twitter team needed to be able to write complex jobs against their cluster so instead of harnessing it to merely count entries in log files, they needed it to perform complex computations and double as a machine learning and linear algebra resource. The social media giant said that they looked to the Cascading framework to wick away some of the MapReduce and Hadoop complexities via the abstraction layer that let them make use of their favorite languages.

For those wanting a little deeper context on the core concepts and use cases around Cascading, Paco Nathan, open source Cascading committer and Concurrent Data Scientist (who came to the company following his role as a Concurrent customer) talks in depth for a group at Nokia.

Related Articles

Cascading into Hadoop’s Golden Era

Mortar Takes Aim at Hadoop Usability

SSDs and the New Scientific Revolution

Technologies: Frameworks

Tags: ansi, Cascading, concurrent, sql for hadoop, wensel

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Cascading into Hadoop with SQL

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Cascading into Hadoop with SQL

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link