Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report
ISC'13

February 18, 2013

Self-Service Data Mining, Hold the Bottlenecks


Self-service data exploration by line-of-business analysts is an ideal that has been elusive in the world of big data. Whether hampered by issues with hardware or data-set tuning, business analysts often find themselves bottlenecked and caught in gyrations between the database admins and the data. 


In a recent article, Platfora’s CEO, Ben Werther says that Cloudera has at least partially answered the challenge with their Impala release by allowing the business-level analyst the ability to do faster ad hoc queries on smaller data sets than had previously been possible. However, says Werther, Impala currently falls short in eliminating the bottlenecks that too often occur between the business level analyst and the DBA. 

The weakness, explains Werther, is that Impala relies on what he refers to as the “Legacy Database” model, where the analyst is still heavily reliant on the DBA “to manage transformation and maintenance jobs, design and implement aggregations, tune performance, etc.” Thus the analyst is still stuck in the DBA/database gyration that can cause slowdowns for both the project, and the organization as a whole – especially in cases where complex queries on wrong tables chew up resources, and slow down every project that relies on the Hadoop cluster. 

“This is not the scalable big-data architecture of the future, and it is exactly the painful world that every customer we talk to is trying to escape,” says Werther. 

Werther makes the case that the Platfora platform solves this problem by taking raw data in Hadoop out of the cluster and building scale-out in-memory aggregates that users can query at will. In much the way a gold panner digs into the stream to pan for gold, the business level analyst can use Platfora to pan into Hadoop for a data set, and examine that set to their heart’s content for the nuggets of insight they’re looking for. All while freeing up the Hadoop cluster for the next data panner. 

“Platfora connects in minutes to any Hadoop distribution and automatically generates MapReduce jobs to build and maintain scale-out in-memory aggregates,” explains Werther (also noting that Impala acceleration is on the roadmap). “Our scale-out middle tier is simultaneously an ‘aggregate cache’ of the data below, and a lighting fast in-memory analytical query engine to the users above.”

The theoretical end result is the elimination of the tango that happens between the analyst and the DBA, as well as the constant resource taxing on the Hadoop cluster that can slow other projects down. 

Related Articles: 

Cloudera Runs Real-Time with Impala 

Could the Data Scientist Be a Bad Thing for Big Data? 


Alteryx Aims to Bypass Data Scientists by “Humanizing” Big Data

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
SGI Hadoop

Sponsored Links

Sponsored Whitepapers

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

Big Data, Big Brains – Sponsored By NetApp

04/22/2013 | NetApp

Big data has proven to be one of the most promising yet challenging technologies for both government and industry. But, before IT leaders can harness the full potential of big data, there are key issues to address surrounding infrastructure, storage, personnel, and training.
MeriTalk surveyed 17 visionary big data leaders to find out what they see as the big data challenges and opportunities as well as how government can best leverage big data. Download the “Big Data, Big Brains Report”.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event