Follow Datanami:
October 24, 2013

Standing on the Shoulders of (Hadoop) Giants

Alex Woodie

Getting started with Hadoop is easy. You can download and install it in a matter of minutes. But making something useful of the open source phenom takes time and money. One vendor trying to make the process easier is Platfora, a developer of software designed to virtualize and personalize the human-to-Hadoop interaction layer. With yesterday’s launch of its Big Data Analytics 3.0 offering, Platfora says Hadoop gets easier.

Platfora debuted its Big Data Analytics stack earlier this year at the Strata Hadoop World conference in the spring. As one of the keynote speakers at the show, Platfora founder and CEO Ben Werther shared his unique perspective on the market. He’ll be up on the stage at next week’s show, too.

To say Werther is outspoken about the big data analytics space–especially the failings of “legacy” tools–would be an understatement. As a former product manager at EMC Greenplum, Werther is eminently familiar with the strengths and weaknesses of the previous generation of analytic tools.

“Today’s problems can’t be solved by the same level of thinking that created them,” Werther says in an interview with Datanami. “Today the industry is focused on, ‘How do I go build BI for Hadoop, and bring legacy technology into the new age?’ But what we see is that those kinds of technologies lead to shallow, pretty pictures with no meaning behind them, metrics driven on faith. Fundamentally, when it comes to this new world of data, BI is BS.”

Platfora CEO and founder Ben Werther.

Traditional BI tools can give users some idea of what’s going on behind the data, but they lack the capability to bring it all together, particularly when it involves massive amounts of clickstream data, machine data, social data, and other fast and semi-structured data types.

Hadoop is a great place to store those types of event streams, as many organizations have discovered. But once the data is there, what do you do with it? “That’s been what people are struggling to figure out,” Werther says. “Once that data streams into Hadoop, the question is ‘How do you make it interactive and fast for a business user sitting down doing work against that.'”

As you may have guessed, that’s the problem that Werther built Platfora and its Big Data Analytics product to solve. The product has three layers. The first sits on Hadoop and generates MapReduce jobs to detect patterns in the data. This aggregate data is then boiled up into intermediate data structures, or what the company calls “data lenses,” that reside in memory on the cluster. The final layer is a graphical front-end that allows users to perform analysis on those intermediate-layer aggregates, and delivers sub-second response times on queries.

The data lenses are the secret sauce for Platfora. The company says they function like dynamic schemas that can be updated on the fly with every new interrogation of the underlying Hadoop data (using the auto-generated MapReduce jobs). “This is not looking at a stream of individual things and doing lightweight aggregates,” Werther says. “This is about, essentially, doing a scale-out, in-memory interactive engine that knows about the underlying Hadoop data. The lenses know how to refine themselves as more data comes into Hadoop, so they’re able to stay up to date and fresh, but serve very fast queries. We’re able to do a type of processing that really isn’t possible with other types of products out there.”

Taken all together, the three components of the Big Data Analytics suite are designed to turn the batch oriented nature of MapReduce on its side, and allow users to do deep analysis their vast data sets in a much more interactive manner. Users could build this sort of system on Hadoop all by themselves, but it could take years. Or they may fail along the way and never reap the rewards of their sacrifices, according to Werther.

The Platfora Big Data Analytics stack.

“If you want to try and weave together these different large data sets–clickstream and events and the rest…it’s incredibly difficult and arcane work to try and figure out how to build a unified data model in a traditional way, and it limits the kind of analysis you can do against that,” he says. The Platfora approach “is something that’s qualitatively richer and just incredibly more agile than the old way of doing it and what SQL enables.”

That old standby SQL may be experiencing a revival in the big data analytics space, thanks to its universality and ease of use. But according to Werther (who does have opinions on the matter), SQL just doesn’t have what it takes to solve the new classes of big data problems.

“SQL is nowhere near expressive enough to support these new classes of processing,” he says. “It’s great for certain types of use cases. SQL is good for traditional reporting…But when you want to start to understand, when you want to connect the dots and do fact-based analysis across different data sets, multichannel and the rest, legacy BI products and SQL-style processing–none of it was really designed for that purpose.”

Platfora’s “fact-based” approach has garnered interest from big data investors and real world customers. Among the companies that have deployed Big Data Analytics are Edmunds.com, Netflix, Disney, Comcast, the Washington Post, and Shopify. The company has received $27 million in funding from investors, led by Andreessen Horowitz, Battery Ventures, In-Q-Tel, and Sutter Hill.

With Big Data Analytics 3.0, which will ship in the first quarter of 2014, Platfora has refined the software. The company says its self-service visualizations (or Vizboards) can be used in a drag and drop manner, and now include funnel charts, graphs, and custom visualizations. Sharing work within the product is easier now, thanks to a new workflow framework. Security has also been bolstered through support for role-based access and support for LDAP and Active Directory. The software supports the major Hadoop distributions, including those from Cloudera, Hortonworks, MapR, and Amazon.

Related Items:

Platfora Cuts Through Big Data Hype and Delivers Hadoop Value

Please Stop Chasing Yellow Elephants, TIBCO CTO Pleads

A Tale of Two Hadoop Journeys

Datanami