Hosted Hadoop Gets Traction
The rise of big data and the promise of data analytics are putting unexpected kinks in the day-to-day pipeline of IT operations. While the phenomenon is driving many IT professionals painfully far outside of their comfort zones, it’s a blessing for the skilled operators of Hadoop clouds, who are starting to grow rather quickly.
In some circles, the very thought of outsourcing the source of a competitive advantage, such as a Hadoop-based analytics application that’s turning unstructured data into well-defined insight, would be viewed as an act of sheer lunacy. “The data won’t be safe!” the on-prem backers cry. “The secrets will be let out! We’ll lose our edge!”
But providers of hosted Hadoop report that these common cloud objections are melting away as the power of simple economics takes over. “Cloud is all about economies of scale,” said Ashish Thusoo, the CEO and co-founder of Hadoop cloud provider Qubole. “The larger the cloud becomes, the more economies can be passed on to the end users.”
Nobody has demonstrated the power of cloud economics as plainly as Amazon Web Services, which is clearly the big dog in the hosted big data solutions market. The company, which isn’t usually that forthcoming with details about the size of its business, let it slip in 2012 that it has more than 1 million Hadoop clusters running on Elastic MapReduce (EMR). Today, EMR is widely considered to be the biggest Hadoop provider on the planet; it likely has more Hadoop customers than all the third-party distros combined.
While Amazon EMR provides a robust Hadoop platform, it still requires users to be experts in managing Hadoop. Prospective Hadoop users who wish for a little more handholding from their vendors have several Hadoop as a service (HaaS) companies to pick from, such as Qubole.
Founded just two-and-a-half years ago, Qubole now has more than 30 customers running on Hadoop nodes that it manages, including prominent Web-based outfits like Pintrest and Quora, and a handful of online advertising specialists. Qubole, which maintains dual headquarters in Silicon Valley and Bangalore, runs its clusters on AWS and Google Cloud, and exposes a high-level Web interface for users to query their Hadoop clusters via Hive, Pig, Presto, and MapReduce.
“We process around 25PB of data every month,” Thusoo told Datanami at last week’s Hadoop Summit 2014, in San Jose, California, which attracted more than 3,200 attendees and nearly 100 vendors. “We cycle through around 230,000 nodes in the cloud in a month. The largest clusters that we brought up are on the order of 1,500 to 2,000 node clusters.”
Thusoo–a former Facebook data center manager who created Apache Hive with his Qubole co-founder, Joydeep Sen Sarma–says Qubole attracted mostly small and midsized businesses when it started running in production in January 2013. But now the company is attracting bigger enterprises with its HaaS offering, which starts at $1,250 per month for 5,000 compute hours and go up from there.
“If you look at the traditional model, you spend months just to get a cluster up and running, before you can actually start getting an ROI on the data. The cloud model completely disrupts that,” Thusoo said. “You can get the compute on demand and you can start using it right away.”
There is a well-publicized shortage of top-level data scientists who can make the most of data. But fewer people are aware there’s also a shortage of IT professionals who know how to build and manage a Hadoop cluster. “Managing a Hadoop cluster is not straightforward,” Thusoo said. “It’s not a platform that DBAs have been trained to deal with.”
Another HaaS vendor gaining traction is Altiscale, which was founded by Raymie Stata, the former CTO at Yahoo, which at one point ran a 40,000-node mega cluster. Stata saw a big gap between what was capable at Yahoo cluster and what customers were trying to put together on their own, so he and his partners launched a “white glove” Hadoop service in January.
Last week at Hadoop Summit, Altiscale announced that it’s now supporting Hive .13. The San Francisco, California company claims it’s the only HaaS provider to support the latest release of Hive, which features a big SQL performance increase over previous versions as a result of the Hortonworks-backed Stinger project. “Hive 0.13 is a key step toward real-time, interactive, and in-memory processing,” said Soam Acharya, head of applications architecture at Altiscale. “Customers should not be held back by their HaaS vendor in utilizing this capability.”
In the meantime, Altiscale continues to grow its business, which currently supports more than a dozen customers utilizing about 90TB of data. Customers are coming to Altiscale to get away from the hassles of managing Hadoop, Stata said. “Upgrading Hadoop is not non-trivial,” he tells Datanami. “They’re tired of upgrades.”
Another HaaS vendor making waves at Hadoop Summit is MetaScale, which is owned by Sears Holdings. The company is finding traction not only with HaaS, but with a series of Hadoop and NoSQL appliances that it unveiled in February. Customers can run the appliances on-premise, and leave the monitoring and management to MetaScale, which remotes into the boxes from its headquarters near Chicago, Illinois.
At last week’s show, MetaScale launched its new “Ready-to-Go Reports” program, which is designed to help clients analyze large amounts of social data. Ankur Gupta, MetaScale’s general manager, says the new offering is designed to help customers get their feet wet with Hadoop and big data analytics without breaking the bank. “Our Ready-to-Go Reports are a cost-effective solution for companies that may still be seeking to determine the real value of Hadoop and big data analytics at their firm,” Gupta said.
Hadoop leaders say 2014 is the year Hadoop clusters will go from being mere science project to being production-ready analytic systems. As the Hadoop market grows, the HaaS sector will move with it, benefiting not only the three vendors mentioned here, but untold new HaaS vendors that will enter the market in the coming years.
People are getting more comfortable with the cloud in general, and that’s going to make the economic argument of HaaS more compelling. In the near past, the break-even point for running Hadoop in the cloud used to be about 2,500 nodes, where anything over than that was cheaper to run on-premise, Qubole’s Thusoo said. “Traditionally there was a tradeoff,” he says. “Now I think that point has moved maybe to 20,000 nodes.”
February 8, 2016
- Teradata, Knowledgent Team Up to Help Healthcare Firms Better Identify Patient Risk
- MapR and Servian Partner to Provide Big Data Training Courses in Australia and New Zealand
- Continuum Analytics Releases Anaconda 2.5
- Pentaho Named a Visionary in Gartner Magic Quadrant for Business Intelligence and Analytics Platforms
February 5, 2016
February 4, 2016
- Teradata Reports 2015 Fourth Quarter and Full Year Results
- Beabloo Selects Hortonworks to Power Analytics Platform for Retail
- Deep Information Sciences Unveils deepSQL Adaptive Database
- Booz Allen Hamilton and Hortonworks Align to Advance Business Transformation
- ASRock Rack Joins EnterpriseHPC‘16 as a Sponsor
- IBM Launches Cloud Data and Analytics Marketplace for Developers
- TIBCO NOW Conference to Return to Las Vegas
February 3, 2016
- MapR Awarded Patent for Converged Data Platform
- NetApp Completes Acquisition of SolidFire
- HDS Delivers Hyper-Converged, Scale-Out Platform for Big Data, Powered by Pentaho
- Eyeview Selects Databricks as Primary Enterprise Data Platform
- Trillium Launches New Solution for Big Data Analytics
- Mu Sigma Appoints Ambiga Dhiraj as New CEO
- Ascolta Partners With Atigeo
February 2, 2016
Most Read Features
- 9 Must-Have Skills to Land Top Big Data Jobs in 2015
- Solr or Elasticsearch–That Is the Question
- As Data Science Evolves, It’s Taking Statistics with It
- What Data Science Skills Employers Want Now
- Hadoop Market is Neck and Neck, Forrester Says
- Picking the Right SQL-on-Hadoop Tool for the Job
- How Toyota Revamped Its Collections Biz with Big Data Analytics
- How Big Data Analytics Is Shaking Up the Insurance Business
- Is 2016 the Beginning of the End for Big Data?
- How Uber Uses Spark and Hadoop to Optimize Customer Experience
- More Features…
Most Read News In Brief
- Six Big Name Schools with Big Data Programs
- Gartner Sees Analytics Boom as More Data is Shared
- Big Data Spreading Everywhere Like Air, Deloitte Says
- Why Gartner Dropped Big Data Off the Hype Curve
- Survey: Big Data Goes Mainstream
- Survey Sees Spark Emerging in 2016
- Real-Time Streaming Gone ‘Bananas’
- What Informatica’s Buyout Means to Big Data Integration
- Booz Allen Targets Data Skills Gap
- Investors Await Clearer Big Data Strategies
- More News In Brief…
Most Read This Just In
- Qubole Closes $30M Investment to Extend Leadership in Big Data in the Cloud
- Oracle Releases 2016 Big Data Predictions
- MapR CEO Reveals 5 Big Data Predictions for 2016
- HDS Delivers Hyper-Converged, Scale-Out Platform for Big Data, Powered by Pentaho
- IBM Acquires IRIS Analytics
- Impetus Reveals Plans to Hire 150 Senior Technologists and Data Scientists
- Databricks Announces Changes in Leadership Team
- Sinequa Announces Series of New Smart Connectors for Cloud and Big Data Environments
- ClearStory Data Announces Apache Spark 1.6 Advances
- SanDisk Deploys Cloudera Enterprise
- More This Just In…
February 18 - February 19San Diego CA United States
March 28 - March 31San Jose CA United States
June 5 - June 7Carlsbad CA United States
June 9 - June 10Washington DC United States