Follow Datanami:
August 15, 2013

Data Driving the Exit Into Hadoop

Isaac Lopez

Despite its long-term promise, one of the side comments often heard when discussing Hadoop is that it’s the king of the “proof-of-concept.” Virtually everyone is playing with Hadoop, but often, especially where established enterprises with entrenched relational databases are concerned, Hadoop stays at the sandbox stage.

It’s partly a crises of confidence, argues, Shawn Dolley, a VP with IT analytics company Appfluent. “Six months ago, I was in a session – there were about 100 people in the room, and the speaker asked the audience, ‘who here has tested, played with, and investigated Hadoop.’” Everyone in the room raised their hand, said Dolley, but when it came to how many had moved to production, there weren’t so many raised hands. “A lot of that is about confidence,” he argues, “where to go first?”

It’s a complicated challenge that the Hadoop distro vendors have to solve. With data in traditional systems growing at unprecedented rates, committing to a new paradigm can be daunting. “The trick is, which of the data sources, and which of the processes are most advantageous in which environment,” says Tim Stevens, VP of Business and Corporate Development with Cloudera. To help customers take that first step, they’ve partnered with Appfluent – a company who is finding new purpose building custom roadmaps into Hadoop.

Founded in 1998, Appfluent has been around for a long time offering analytic tools for the data warehouse aimed at diagnosing performance issues, increasing efficiencies, and generally getting the most out of existing resources. But with the rise of Hadoop, the company believes they’ve found a new niche in helping enterprises make data-driven decisions as they transition their precious bits into Hadoop.

Their tool is essentially a high-powered x-ray into the data warehouse. Using Appfluent’s diagnostic tool, enterprises are able to spelunk to the deepest depths of their database and come back with an information haul on virtually every aspect of the system, its data, and usage. Appfluent’s cataloguer dives in and logs every table, view, user, SQL statement – tracking both historically and into the future. With this data in hand

With this information in hand, these companies have a virtual custom roadmap based on their unique situation into the world of Hadoop, says Stevens. “Enterprises for a long time have been faced with the decision of determining what data should go into Hadoop,” he commented. “With Appfluent, we can work with customers to help them understand the totality of their data warehouse and data mart environments so that they know what data and processes they are able to migrate into [a Hadoop environment].”

When Expedia decided to cap their data warehouse at 200 TB, explained Dolley, they used Appfluent to help determine which data needed to stay put, and which data to move into the Hadoop overflow. According to a release from last Fall, Expedia now manages over four petabytes of data using Cloudera Enterprise.

In many cases, says Dolley, the company will find thousands of columns that are never or very rarely queried, taking up blocks of the database and ultimately hindering performance. In the past, these dormant columns might simply be jettisoned to free up space, but under the big data paradigm, that is treachery – there is no telling what kind of corollary gold might be found in those bits. “Those columns are the ones that need to be in a warm archive like Hadoop, rather than in an expensive data warehouse.”

Hadoop as a database paradigm of the future is virtually inevitable at this point, the question is the speed of adoption, particularly for entrenched businesses. It’s interesting to see companies like Appfluent, who have roots in the old world, transforming to ushers into a new one.

Related items:

The Three T’s of Hadoop: An Enterprise Big Data Pattern 

Bare Metal or the Cloud, That is the Question… 

Manufacturing Real-Time Analytics on the Shop Floor