Hadoop Sharks Smell Blood; Take Aim at Status Quo
There may be blood is in the water as major Hadoop sharks, Cloudera and MapR, roll out new campaigns and messaging aiming directly at the turf of the traditional database management systems. Does their hype match-up with reality? Not everyone thinks so.
Cloudera, this week, hosted an event that featured the announcement of their new search capabilities complete with bold statements claiming Hadoop as the central figure in what they claim is a datacenter gravity shift.
“We’ve spent decades building systems that have been very successful for managing data, but we are in some sense a prisoner of the success that those systems have delivered,” said Mike Olson, co-founder and CEO of Cloudera at their “Unaccept the Status Quo” event this past week. “Those systems are very, very good at solving 1980’s and 1990’s problems. They were designed for 1980’s scale, and 1980’s business questions.”
Olson laid out all the problems that he sees with the traditional database systems (ridged schemas, secularized & siloed data, heavy costs as data grows, scalability, etc.). Adding that these systems have become mission critical to many, he cast the traditional datacenter as a ball and chain that can be rationalized through the use of Hadoop.
However, Gartner analyst, Nick Heudecker, took some air out of the Hadoop “gravity shift” balloon, saying that Cloudera might be overstating it a bit. “I think that some customers are starting to explore [database offloading] as an option, particularly for data that is cold – it lightens the load on the existing data warehouse, specifically for data that you may only hit once or twice a year for aggregate reports,” he told us this week.
Heudecker instead described Hadoop as a player in the emergence of what Gartner refers to as “the logical data warehouse.” In this paradigm, instead of simply having just the traditional DBMS that exists today, a hybrid approach would emerge that will be a blend of services and technologies. Heudecker says that this model is largely driven by the need for different levels of SLAs for different types of data sources and processes. In this model, says Heudecker, Hadoop would be cast as the distributed processing component of the architecture, containing comprehensive and undefined information – “That’s where a lot of unstructured data would be stored.”
“It’s happening with some customers,” says Heudecker. “While they may not be calling it the logical data warehouse, they’re expanding beyond the repository and moving into federated data sources as well as distributed processing.”
Of course, this might be seen as part of the status quo that Mike Olson and Cloudera are asking people to “unaccept,” as they consider newer and cheaper ways to operate without the overhead of the traditional systems.
“You can’t get rid of [the traditional data center] – but you can rationalize it,” Olson said. “You can stand up Hadoop next to your enterprise data warehouse. You can augment your existing infrastructure with a new massively scale-out platform that’s able to store much more data more affordably; that’s flexible in the type of analysis that you do; that gives you different ways to get at that data – you can build out this infrastructure and change the rules.” Saying that these aging systems are facing an inevitable collapse, Olson cast Hadoop as the new center of gravity in the datacenter.
Cloudera’s competitors over at MapR are hitting the circuit singing very similar, if not more restrained refrains. Datanami recently spoke to Jack Norris, Chief Marketing Officer for MapR, who referencing a whitepaper prepared by Intelligent Business Strategies, Mike Ferguson, also cast Hadoop as a central player in the data warehouse.
While Norris initially spoke about Hadoop as a database offload, his aspirations for the framework’s future came clear. He posited that the cost of disk and data storage on Hadoop is a relatively “free” resource, leaving vendors with a blue sky in figuring out how to best exploit it. “I think what it’s increasingly pointing to is that the architectures of the future, whether you call them cloud or big data architectures, it’s about being able to handle fast data, being able to automate, being able to reduce the data movement so that what Mike Ferguson calls the “enterprise data management hub” (starring Hadoop at center stage) continues to scale and grow and more processing and more analysis is done directly on that.”
Norris believes that it’s a trend that will persist. “I think the larger issue here is that the pendulum is sort of swinging back where organizations are really tired of having multiple data silos, and multiple sources of critical information,” he hypothesized. “They want to reduce that number; they want to increase their control, and handle this high arrival data – this machine generated content in a very fast and efficient manner.”
Not everyone is in agreement. In a recent conversation with SAS CTO, Keith Collins, he expressed concerns about organizations jumping into Hadoop without a concrete plan for ROI. “I am very fearful that IT left to the ‘everybody has got to have a big Hadoop cluster, I just got to have me some, and I don’t know what I’m going to use it for, but I’m just going to put a bunch of data in there,’ is fraught with the same failures and disappointments that we had before,” said Collins. “If you’re just doing big data without a big analytics approach – a way to get some answers – it’s just a big ‘so what.’”
While the distro vendors talk gravity shift, Collins talked about Hadoop as a tool still finding adoption as an ETL processing tool. “If you take, for example, the large database centers, they’re all starting to describe Hadoop as, ‘that’s where you put all the data before you clean it up and put it into the database.’”
The Hadoop vendors clearly disagree. “Yeah, use Hadoop and do your ETL processing, and then load it into the data warehouse,” argued Norris. “But the real economic level is if that Hadoop platform is a long term store itself.”
While none of the Hadoop vendors will openly say that they believe Hadoop will ever completely replace the traditional database, it’s clear that the Hadoop war is starting to cross boundaries into the traditional world, with all of the organization’s data being the ultimate prize.