Follow Datanami:
October 26, 2012

Visualizing Big Data’s Key Partner

Ian Armas Foster

Visualization is vital to managing big data. The proper charts, graphs, and other representations of large datasets can let business users see trends they would not know existed otherwise. And after having a busy week at Hadoop World, Tableau appears to be on the forefront of the big data visualization market.

We caught up with Dan Jewett, Vice President of Product Management, to talk about the implications of the upcoming partnerships, which they announced over the course of the week, on the company that specializes in visualization.

For Jewett, one of the more prescient themes of the big data spectrum is performance. “Performance is critical and we’re trying to see a renewed emphasis on performance for all these vendors,” said Jewett with regard to determining which vendors Tableau decided to partner with. One of their biggest partners, and one with whom they joined to make a big announcement yesterday, is Hadoop distribution power Cloudera.

 “We’ve actually been supporting connecting to Cloudera for a little over a year now,” said Jewett of what they consider to be one of their more important partners. Cloudera announced yesterday that their ambitious new Impala platform that aims to turn Hadoop from solely a batch processing tool to one that can provide analysis in real time.

Tableau is to be the major visualization partner behind Impala, a product outlined by Cloudera CEO Mike Olsen here. According to Jewett, Impala processes data an order of magnitude faster than before. For business users looking to leverage that speed, the visual analytics side has to keep up. Jewett and Cloudera hope Tableau is up for the challenge.

Tableau also announced a significant first-time partnership with Hortonworks. With that announcement, Tableau completes the major Hadoop distributor hat trick, as they also work closely with MapR. Until some other framework overtakes Hadoop in the open source big data realm, Tableau has as a result more or less established itself.

Along with Cloudera and Hortonworks, Tableau announced several new partnerships which involve unstructured data (with an emphasis on text analytics), collaborative data science efforts and more. For example, Greenplum’s open source Chorus initiative, which is hooked up to the Kaggle data science competition community, is having their visual side powered by Tableau. Jewett also mentioned Digital Reasoning, a start-up focused on bringing unstructured data, and text in particular, into the realm of the structured.

Underlying those connectors is a company that Tableau has a good relationship with in Simba. According to Jewett, Simba serves as a conduit for delivering the data from the databases to Tableau. “Simba provides a couple of things,” Jewett said “including the raw transport from us over to the databases and doing some additional translation of the statements that gets sent through the driver to be appropriate for the database that it’s talking to. For example, with the Hadoop guys, it’s a driver that talks to Hive, so it’s the transport from the client tool over to the Hive engine on the back-end server.”

This means that Tableau would be able to deal with the data on its own terms with external drivers such as Simba (Jewett noted that Simba was not the sole driver but the one they work most with) delivering and translating the data between the databases, such as the likes they’ll be working on with Cloudera, and Tableau.

“We send out SQL or derivative forms of SQL out to the different databases that we talk to,” said Jewett regarding the specifics of the translation processes happening along Simba between the databases and Tableau.

Going deeper into what Tableau wants to do, visualizing sizable swaths of data is difficult. As Jewett notes, a normal human cannot simply look at several million rows that constitute a large dataset and pick out the trends. Nor is it particularly obvious which functions should be run on those datasets to produce any sort of insight.

Jewett spoke to these challenges in reference to someone trying to generate analysis from petabyte storage systems. “If you dumped two million rows of data, if you run a query through a petabyte system and you’re getting back two million rows of data that were the subset of the information that answered your question, there’s no way that looking at a grid of that data would help you out. As you take that information and you look at it visually, trends, outliers, nuanced patterns that are in that data start jumping out at you pretty quick and it allows you to continue your cycle of iteration.”

Tableau has a decent pedigree in the big data world to back up the claims they make. In a recent interview, CTO of Hadapt (a big data startup which is trying to allow users to input analysis queries without having to write them in MapReduce), Philip Wicklin mentioned that their BI clients are overwhelmingly choosing Tableau. That endorsement is not insignificant for Tableau when their competitors QlikView and Splunk also feature high profile connections.

Of course, for Jewett, the challenges roll back to performance. “Performance on what’s happening on the back end to respond back to you is critical. It’s kind of a buzzkill if you’re going through an iterative process and it takes twelve minutes between every question you ask. It’s hard to get into that flow of exploring your data.”

Like a good statistical analysis, a good visual analytics tool flits its eyes and says, “look over there,” except here it determines actionable enterprise trends. Tableau appears to be doing the same thing both with their tools and their announced partnerships, seemingly on the pulse of the massive data visualization realm.

 

Datanami