April 29, 2014

Big Data Confusion Drives MongoDB and Cloudera Together

Alex Woodie
mongodb_logo.png

At first glance, the partnership that Cloudera and MongoDB unveiled today is a bit of a head scratcher. While the two companies are arguably the biggest software vendors in the nascent space, they swim in opposite ends of the big data pool. It turns out, that’s exactly why the companies felt they needed to work together.

People who have worked in this space typically know what Hadoop and NoSQL do and what they’re for. Hadoop is for analytics. It’s good at crunching massive amounts of mostly unstructured data, and turning it into insight. NoSQL is for operational data stores. It’s good for building applications that need to store and process massive amounts of mostly unstructured data.

But those terms and phrases don’t mean much to people who are just figuring out where everything fits in the big data scheme of things. This knowledge gap was on display following a presentation that MongoDB’s vice president of marketing and business development Matt Asay delivered the Strata + Hadoop World show last fall.

Asay thought the presentation, about using Mongo’s NoSQL database with Hadoop, was fairly straightforward. “When I got off the stage, I was mobbed by people saying ‘Wait, I thought MongoDB and Hadoop were competitive,’” Asay tells Datanami. “You know Mongo is an operational data store, and Hadoop’s super strong in analytics. But you’d be surprised at how many people get confused and think they basically do the same things.”

That confusion is the genesis of the new partnership that MongoDB and Cloudera unveiled today. The partnership spans several parts of the two companies, including marketing, sales, and product development, and is aimed at enhancing both of the company’s respective offerings and competitive positioning.

There have already been many cases where salespeople for the two companies have been called upon to educate prospects about the various pieces that make up the emerging big data stack. “All the enterprises know is they’ve got a big data problem,” Asay says. “They say, People tell me to use Hadoop or MongoDB, or X,Y, or Z technology. I don’t want technology problem. I just want to fix this data problem.’”

Cloudera and MongoDB are working on “mapping” their respective sales org charts to make it easier for a salesperson to reach a counterparts at the other company when good leads arise. That lead sharing will make the two companies—already among the oldest and most highly valued companies in their respective niches—even stronger under the new alignment. There will also be joint marketing initiatives, bus tours, and executive speeches aimed at highlighting how the two companies’ products can work together.

The companies are also working to bolster the technology that connects their respective products. MongoDB has a bi-directional connector that feeds BSON data from its NoSQL database into the Cloudera Distribution for Hadoop (CDH) offering. However, that connector is based on MapReduce technology, which is not the ideal approach. Instead, a new YARN connector that’s in the works will make it much easier to share data between MongoDB and CDH.

“We’re in active development to build a YARN-based application that will reside within the Cloudera cluster,” says Yuri Bukhan, director of ISV alliances at Cloudera. “There will be controllers and agents that are part of the cluster and they’ll interact with the various MongoDB shards. So we’ll be streamlining the process and giving visibility for how the data is going to move across the different systems, and then giving users the control to bind to the resources that are allocated to the movement of that data.”

The tightened integration will bolster the use respective use cases for NoSQL database and Hadoop. So, exactly how, again, should we view this big data stack? The way that MongoDB’s Asay sees it, Hadoop systems like Cloudera’s are great at analyzing crowd data and drawing insights from them, while NoSQL systems like MongoDB’s are great at delivering the services to individuals that arise from those insights.

“You can think of all sorts of use case where the interaction between crowd behavior and the individual and feeding the individual behavior to make that ‘crowd database’ as it were, richer, and then taking the intelligence from that crowd analysis and pushing it back to MongoDB, becomes super, super powerful, whether you’re an online retailer, an advertising network, or a bank,” he says.

The product naming could also, perhaps, use some sprucing up, says Asay, who admits that MongoDB maybe should ease up on the mentions of “data hub.” “Hitherto,” he says, “companies have been forced to figure out the relevant technology on their own and have had to wade through all these weird product names, whether it’s Hive or Impala or MongoDB. We’re trying to demystify it and make it easy to consume the best big data technologies out there, or certainly the two most popular ones.”

Will MongoDB and Cloudera end up better together?

Considering the aversion that Hadoop backers have to moving data and the whole “analyze in place” message, it’s possible down the line that the companies would work to bring MongoDB into the CDH lineup as another engine processing data in Hadoop, and provide an operational data store component for Cloudera’s enterprise data hub strategy. Hadoop creator Doug Cutting, the head architect at Cloudera, has said Hadoop could become a transactional engine.

“I think that’s possible, probably at a later phase,” Bukhan says. “We want to be sure that if we do co-locate both products within the same cluster and server, that the experience at the end of the day for the end user is not hammered.”

Asay, who is a closer personal friend of Cloudera co-founder Mike Olson, drove the idea of the partnership at MongoDB, while Olson and other executives were instrumental in making the partnership work at Cloudera. Both companies recognized the need to reach out and work with the other, Assay says.

“We’re the top two big data technologies, so it’s actually pretty important that they work well together,” he says. “There will always been use cases where MongoDB is great and should be used in isolation and there will be cases where Hadoop is great and it should be used in isolation. But actually the two technologies and companies are actually better working together.”

Related Items:

Can MongoDB Make NoSQL Fun Again?

Glimpsing Hadoop’s Real-Time Analytic Future

Intel Exits Hadoop Market, Throws In with Cloudera