HPC Analysts: Hadoop is a Tall Tree in a Broad Forest
This week, Datnami is in the very interesting city of Leipzig, Germany as we attend the International Supercomputing Conference (ISC’13) to cover the big data trends at the high performance level. It’s only been a day, but one of the messages from the show floor that has been resonating is that big data is bigger than many people think it is – namely, it’s bigger than just Hadoop.
To say that there is hostility on the floor over Hadoop would be considerably overstating it – there is a healthy respect for Hadoop and its potential, however, the mighty elephant has sucked a lot of the oxygen from the big data discussion to the point where it’s almost become synonymous with the term “big data,” and many HPC’ers feel like their turf is being encroached. I spoke with Addison Snell and Chris Willard, analysts (and co-founders) with Intersect360 Research about Hadoop and more on the ISC show floor. They let me know that while Hadoop is one of the biggest trees in the big data forest, it’s not the only tree.
Snell says they’ve recently done a study focused on the opportunities for high performance computing technologies to be deployed in big data applications areas. Done together in partnership with Gabriel Consulting, he tells me that they surveyed both the traditional HPC sort of end user as well as a non-HPC oriented, scalable enterprise user.
He hands me a card filled with trivia questions that has come from their research. The top question grabs my attention: “Q: What percentage of big data applications users in HPC reported using Hadoop: 16%, 21%, 45%?” I guess 16% and win a grin from Addison, who then rhetorically serves up the next obvious question: who are the other 84% and what are they using to manage their data?
“Really, 50% of the respondents cited that they were running their own in-house applications and algorithms,” he explains. The other 34% percent, he says, are using any number of a long tail of solutions. “There are a lot of things that got 5 votes, 3 votes, 1 vote each.”
This explains to me the discontinuity on the show floor as vendors are eager to let you know what they’re doing to support Hadoop, while some of the technical people just shrug.
“HPC’ers sometimes bristle at Hadoop, and with good reason,” explained co-founder of Intersect360, Chris Willard. “The idea of managing very large file systems and very large data sets has been around in HPC from the beginning. One of the definitions of ‘supercomputer’ is a system that turns a compute bound problem into an IO bound problem, so somebody coming in and saying, ‘we have 500 terabytes, we have a petabyte, we have big data,’ will annoy people dealing with that size of data for a long time.”
Willard hints at skepticism he has about Hadoop being able to deliver the pot of gold at the end of the data rainbow. “Hadoop more or less started out as an example where in a few specific cases it was very successful – namely in the Internet search area – and people looked at that, gave it the name ‘big data,’ and said you can generalize that case to have success in other areas,” he explains. “And that’s an interesting proposition. I’m not saying it’s a false proposition – I suspect there are going to be many areas where it does follow through and produces successful and profitable applications. But I think at the other end, every place that you use Hadoop is not going to be a magic formula and produce results.”
Both analysts agree that it ultimately comes down to application, which Snell says there are a lot of questions out there as vendors try to navigate their path. “There’s not a single killer app,” he argues, commenting that a lot of clients are looking for the big data magic bullet. “Some markets aren’t like that…you can come up with certain categories of applications or workflows, but if you try to narrow your positioning down to that single killer app, you wind up giving away a substantial portion of the market.”
Hadoop aside, Snell says that their research has shown that there are some interesting trends developing as high performance computing crosses over into the enterprise to help meet big data challenges. He says that in their last study, both the HPC oriented respondents as well as the non-HPC respondents flagged IO scalability in general as being a shared pain point.
“To find an enterprise oriented population that was using metrics of performance as a number one criteria, beyond the RAS (reliability, availability, serviceability) oriented performance that you normally see in enterprise was a real eye-opener.” He says that this underscores an opportunity for the scalable storage, interconnect, compute and cloud technologies that have a home in traditional HPC to potentially expand into the broader enterprise market that has been skittish about HPC in years past.
Snell says that he expects this trend to manifest itself through the increasing adoption of parallel file systems in both HPC and the broader enterprise big data space. Intel announced a move in this vein last week, saying that they are in the process of mainstreaming of their Lustre file system, including the ability to fully swap in Lustre over the beleaguered HDFS. Snell mentioned IBM’s GPFS storage server as a complementary product in this space.
“I think you’ll see a growing market for parallel file systems in general,” he predicts. “General microeconomics follow that in a growth market, there is market share for both – you don’t have a land grab where you’re fighting for every last client like you do in a shrinking market. I think the uptick of parallel file systems in general ought to be a rising tide that floats all boats.”
Ultimately, says Snell, it’s about democratization of these technologies. “You hear the discussion of making HPC more adoptable, and that has crossover to big data as well,” he explains. “Because as much as I just said that you’re evaluating the performance characteristics of these solutions for a big data type of deployment, the fact of the matter is it’s still enterprise and you do need to meet those RAS and IT thresholds. It can’t be unreliable if you’re going to install it in the enterprise.”
Stay tuned… We are at ISC’13 all week and will have reports coming straight from the show floor as we explore the intersection between HPC and big data.