DataTorrent
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


February 18, 2013

Self-Service Data Mining, Hold the Bottlenecks


Self-service data exploration by line-of-business analysts is an ideal that has been elusive in the world of big data. Whether hampered by issues with hardware or data-set tuning, business analysts often find themselves bottlenecked and caught in gyrations between the database admins and the data. 


In a recent article, Platfora’s CEO, Ben Werther says that Cloudera has at least partially answered the challenge with their Impala release by allowing the business-level analyst the ability to do faster ad hoc queries on smaller data sets than had previously been possible. However, says Werther, Impala currently falls short in eliminating the bottlenecks that too often occur between the business level analyst and the DBA. 

The weakness, explains Werther, is that Impala relies on what he refers to as the “Legacy Database” model, where the analyst is still heavily reliant on the DBA “to manage transformation and maintenance jobs, design and implement aggregations, tune performance, etc.” Thus the analyst is still stuck in the DBA/database gyration that can cause slowdowns for both the project, and the organization as a whole – especially in cases where complex queries on wrong tables chew up resources, and slow down every project that relies on the Hadoop cluster. 

“This is not the scalable big-data architecture of the future, and it is exactly the painful world that every customer we talk to is trying to escape,” says Werther. 

Werther makes the case that the Platfora platform solves this problem by taking raw data in Hadoop out of the cluster and building scale-out in-memory aggregates that users can query at will. In much the way a gold panner digs into the stream to pan for gold, the business level analyst can use Platfora to pan into Hadoop for a data set, and examine that set to their heart’s content for the nuggets of insight they’re looking for. All while freeing up the Hadoop cluster for the next data panner. 

“Platfora connects in minutes to any Hadoop distribution and automatically generates MapReduce jobs to build and maintain scale-out in-memory aggregates,” explains Werther (also noting that Impala acceleration is on the roadmap). “Our scale-out middle tier is simultaneously an ‘aggregate cache’ of the data below, and a lighting fast in-memory analytical query engine to the users above.”

The theoretical end result is the elimination of the tango that happens between the analyst and the DBA, as well as the constant resource taxing on the Hadoop cluster that can slow other projects down. 

Related Articles: 

Cloudera Runs Real-Time with Impala 

Could the Data Scientist Be a Bad Thing for Big Data? 


Alteryx Aims to Bypass Data Scientists by “Humanizing” Big Data

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

ISC'14

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event