San Francisco-based stealth startup BlueData is gearing up to release a virtual machine layer that ostensibly would take much of the pain out of Hadoop deployments. Founded by former VMware executives and backed by a roster of IT industry veterans, the company recently spoke with Datanami about what’s holding back Hadoop, and how its forthcoming product (still in beta) will fix it.
“Think of our software as a VMware for big data,” says BlueData co-founder and CEO Kumar Sreekanti, who was previously the vice president of R&D for VMware’s Cloud Infrastructure Business Unit. Sreekanti and Tom Phelan, a former senior architect at VMware, founded BlueData in 2012 with the hope of doing for Hadoop what VMware did for production servers. So far it’s attracted $19 million in Series A and B venture funding, and it appears on its way to releasing a product later this year.
“There’s a lot of good work going on the big data space, especially at the [Hadoop] distribution and analytics levels, but there’s not a lot of focus on the infrastructure for big data,” Sreekanti continues. “We are working on a software platform that helps enterprise to build a very agile and flexible and elastic software infrastructure for them to be able to run their big data applications in a very cost effective, agile, and elastic fashion.”
As you can tell, BlueData exhibits a keen focus on elasticity and agility, which are two words that not normally associated with big data and Hadoop. Hadoop gets the lion’s share of attention these days when the discussion turns to big data, and with good reason–it can enable customers to store and analyze less-structured data in ways that were not previously possible. But getting the big yellow elephant up and running isn’t for the faint of heart.
“Today Hadoop is fairly difficult to install,” BlueData chief architect Phelan tells Datanami. “You take the open source from Apache, you compile it yourself, and you try to install it on a set of hardware servers. That takes a fair amount of expertise. We believe that level of expertise limits Hadoop’s ability to be used by many customers.”
BlueData’s product is currently in beta testing with a handful of customers. While the company is hush-hush about specifics, we know that the general idea is to incorporate virtualization technology in such a way that it burns out some of the complexity surrounding the setup, configuration, monitoring, and on-going management of Hadoop clusters.
“Our software would run within a data center of a customer and they would stand up essentially a Hadoop-tuned private cloud,” Phelan says. “Our goal is that the customer would be able to spend their resources finding the best analytics and creating the best data mining tools rather than having the frustration of working with their infrastructure. So we greatly simplify the infrastructure task for them.”
The BlueData software will support both containers (think Docker) and virtual machines (think VMware ESX). It will work with any Hadoop distribution, both open source and commercial. The software will support “very strong multi-tenancy,” Sreekanti says, and could work with managed service providers (MSPs) running public Hadoop-as-a-Service (HaaS) clouds, but the initial focus will be on private enterprises running Hadoop on-premise.
As we understand it, BlueData will compete to an extent with HaaS providers who are also trying to hide Hadoop’s complexity from those who would like to use it. HaaS providers like Qubole, Metascale, and Altiscale are seeing demand increase for their managed Hadoop services. Just get your data into our Hadoop cluster, they say, and you can concentrate on building your data science team or running your big data analytics application. Leave the nitty gritty management details to us.
BlueData co-founder and CEO Kumar Sreekanti
But according to Sreekanti, cloud isn’t part of the equation for the biggest enterprises. “There’s going to be situations where if you’re a Web hosting company or a Web-based company and you’re collecting data on [Amazon] S3, it makes sense for you to analyze your data on S3. But collecting 1TB of data in different places and moving it to a hosting provider to analze seems to be impractical,” he says.
There is also the knotty question of data security on the cloud. Sreekanti says he recently spoke with representatives from two large European companies who said they would not put their data on the cloud in part because of the NSA. “It’s my belief, and our investors believe, that it’s not a panacea, that public cloud is not going to solve everything.”
In any event, the company seems well positioned to capitalize on the next wave of data center build-out driven by the big-data boom. Today the company announced the creation of an executive advisory board composed of IT veterans. The new board is composed of:
- Frank Slootman, president and CEO of ServiceNow and former CEO of Data Domain;
- Steve Kleiman, senior vice president and chief CTO of NetApp;
- Mike Nelson, former Fellow at VMware, where he architected vSphere, vMotion and VMFS;
- Dhruba Borthakur, software engineer in the database engineering team at Facebook and an early Hadoop architect at Yahoo;
- Mike Kail, vice president of IT operations at Netflix;
- Sudhir Ispahani, CEO of Alpha Global Partners LLC;
- and Lorrie Norrington, operating partner at Lead Edge Capital and former president of eBay Marketplaces
The company also announced it has appointed Eric Wolford, a venture partner at Accel Partners and former president of the products group at Riverbed, to its board of directors. The company is well-funded and well-staffed. Now it just needs a product to sell. Stay tuned to Datanami to find out when.
Hosted Hadoop Gets Traction
Moving Beyond ‘Traditional’ Hadoop: What Comes Next?
Yahoo: We Run the Whole Company on Hadoop