In a keynote session to kick off Hadoop Summit in this week, Gartner research VP, Merv Adrian, gave a level-setting presentation on Gartner’s view of the Big Data market space, discussing the preliminary results from a recent market survey, and discussing the challenges as well as impending innovations. With numbers piping hot out of the survey oven, Adrian said that the sector has a lot to look forward to, and more work still to do.
Adrian kicked off with a numbers parade that represented some very good news for vendors looking to cash in on the big data movement. The size of the investment community for big data solutions is robust, according to Gartner, with nearly 65% of the surveyed companies saying that they either already have invested in big data, or have plans to in the next two years. Adrian said that 30% of those vendors surveyed had already made investments; 19% plan to within the next year; and 15% plan to invest within the next two years.
This good news was not given without a caveat. “There’s something about predictions,” he quipped. “You have to remember that a CTO’s budget is a little bit like a kid’s letter to Santa Claus. Sometimes they get what they want, and sometimes they don’t.”
Looking at the negative side of the numbers, a full 31% of those surveyed have no plans for big data, and according to Adrian, given the way the questions were structured, “have no plans to have a plan.” Interestingly, in a comparison to last year, Adrian noted that this number stayed exactly the same over the last two years.
“There is an intractable third of the marketplace out there that so far says ‘big data - dismissive hand wave - hype. Let those guys go to Silicon Valley, we're not ready for any of that stuff yet.’” Adrian commented that the rising big data phenomenon still has a long way to go to demonstrate value to this sizeable dark area of the market. Adrian did note a bright spot in the negative number trends in that the number of people who didn't know what their plans were have fallen by over half from the previous year, from 11% to 5%.
In addressing challenges in the industry, Adrian noted several factors, including the complexity of Hadoop, itself. In a striking visualization, he represented the Hadoop stack with all the components for such functions as ingesting & propagating the data, describing & developing the data, monitoring, machine learning, etc. With each stack on its own plane, he streamed the potential solutions for each aspect (both open source and commercial pieces), demonstrating the complexity of Hadoop, and thus the need and opportunity for the distribution vendors in the space.
Adrian also described a culture clash that exists between the old guard he calls “the suits,” and the rising wave he represented as “the hoodies.” The suits, he said, use data warehouses that represent curated, pre-optimized collections of data that answer the set of data they know they're going to be asked, use commercial software, and are very grounded in the old world of data management. The sweatshirts are opportunistic, experimental, they play with the tools they have, use them for discovery, and are into open source software. The research analyst intimated that these cultures would need to be bridged in order for organizations to be successful in the future.
Looking at the horizon, the research analyst discussed what he saw as the next steps for Hadoop as it emerges into its 2.0 form. These included the following:
- Search – “Let me apply the Silicon Valley standard: It’s been announced, so it’s here,” said Adrian, poking fun at the recent beta announcements. “Realistically…It is here, it is coming, it’s in products, we’re going to see it in real life, people are going to start using it.”
- Advanced Prebuilt Analytic Functions – “We’re going to see these advanced analytic functions coming into the market, or be integrated into the stuff we’ve got.”
- Cluster, Appliance, or Cloud – “We’re going to see a very big marketplace question answered: do we keep building clusters, node by node – or do we start buying appliances. There’s a lot of people making appliances right now – it’s not clear at all that that’s what the market wants. And of course they can go to the cloud as well, and often both.”
He also touched on virtualization and graph processing, and, of course, the belle of the Hadoop Summit ball, Apache YARN, framing the advancement as a great leap forward.
Speaking of gaps that still need to be filled, Adrian pointed to a lot of the familiar culprits, including security, governance, data warehousing tools, distributed and subproject optimization, and the ever persistent challenge of skills.
“Gartner estimates that we will need 4.4 million people worldwide by 2015 to do the big data work,” said Adrian, explaining that this number runs across all geographies and verticals. According to Gartner, only one-third of those jobs will be filled, which he framed as good news for those who do have the proper skills. “When resources are scarce like that, bidding goes on.”
To close his talk, Adrian encouraged the developers at the event to be proactive in their organizations by finding the opportunities to uncover “dark data” that can be mapped to business opportunities, and building alliances across the organization to help facilitate the move into the new data era. He said that developers should consider cloud pilot projects that minimize capital expenditure. And of course, our favorite, he encouraged the audience to follow the news to keep up with the fast changing environment.