Follow Datanami:
August 8, 2012

MapR Floating Google Cloud

Datanami Staff

MapR scored a major coup in the public cloud arena when they secured an exclusive deal with Google to provide software for Hadoop services relating to their recently announced Google Compute Engine.

The distro vendor was just one many “big data” companies that have been lining up on Google’s cloud but is so far the key partner on the Hadoop distro side. Companies like RightScale on the infrastructure side and several analytics firms, including Pervasive, Jaspersoft, Talend, and Informatica have joined forces with them on the BigQuery front.

“Google is the preeminent expert on big data and MapReduce frameworks and for them to select MapR is a huge validation of our architecture and our approach,” said Jack Norris, MapR’s VP of Marketing, to Datanami when we caught up with him last week. Indeed, MapR has been making quite a splash in the market lately, also recently inking a deal to work with Amazon Web Services. That fact in itself holds fascination due to Google Compute Engine hoping to be their answer to AWS.

Norris says that when it comes to their competitive position among the other two major Hadoop distro vendors, Hortonworks and Cloudera, their performance and their ease of use separate them from the pack. “There’s a difference between saying that a distribution is enterprise ready when you can’t offer data protection, no snapshots, no mirrors, when you lack the ability to integrate with standard file based protocols. MapR has all those capabilities.”

Norris split the notion of performance pie into two slices, which fork out as reliability and velocity. While it could be understood that velocity and reliability go hand-in-hand, he says it is possible to build a system that performs one test problem quickly before crashing in another. These failures have been systemic in big data platforms and avoiding them has become a major focus in the market. An earlier piece here noted how India delegated their software needs to over a dozen vendors before implementing their national identification program in hopes of countering crashes.

Norris hopes MapR’s new system of partitioning the name nodes will do exactly that. “When you talk about HA (high availability), you have to have automated spacial recovery and the ability to handle multiple errors. So you separate the one main node into multiple main nodes. You’ve eliminated the single point of failure and are replacing it with multiple points of failure.” Centralization is the scourge of big data. It creates bottlenecks when any centralized point fails as it inevitably will. Splitting up the main node theoretically accomplishes that.

Copying and transporting big data is both time-consuming and expensive. So it is not surprising that a top vendor like MapR has taken to a recent trend of performing operations in the cloud. “In general, Hadoop is about processing and doing analytics on data at rest and eliminating the movement of data. If the source of the data is copied already, you have advantages of doing analytics in the cloud.”

But of course, Google is not just interested in reliability. They have built their reputation upon speed. MapR’s reported velocity skills are impressive. According to Norris, MapR was able to terasort a 1200 node Google cluster in eighty seconds. The world record is sixty-two seconds. But the world record effort was made using more servers, twice as many cords, and four times as many discs. Certainly, given the resources, MapR would be expected to break the record.

Amazon Web Services has a significant head start on Google in offering their public cloud infrastructure-as-a-service. But search engines aside, Google bucks the trend of caring about getting to the market first. Instead they are content to sit back with more of a focus on developing superior technologies. Where MapR fits into all of this is a complicated issue, since they appear to be offering their Hadoop structures to both AWS and Google Compute Engine.

This may be no different from an apparel company supplying two different rival football teams. MapR is seemingly working closer with Google as their eighty second terasort was accomplished with Google and Norris did not mention Amazon over the sixteen minute conversation. It is also worth noting that Norris was unable to come up with potential disadvantages to their technologies, specifically their node splitting. Either way, Norris may have been talking in marketing-speak, but at least it was Google-approved marketing-speak.

Google has a tendency to push the performance of every arena it enters from email and web browsers to social networking and smartphones. Google now takes the next step in a less publicly visible but arguably more technologically important field of big data. That this is happening is natural, Google after all collects a vast amount of data through their various enterprises.

MapR is going to be the Hadoop contributor to this Google Compute Engine, a fact that should and does, as Norris puts it, vindicate MapR’s market position. Whether this partnership results in Google taking the cloud like it assuredly expects to remains to be seen.

Related Stories

Six Super-Scale Hadoop Deployments

Chips, Stats & Stones: A Morning with SAS CEO Dr. Jim Goodnight

How 8 Small Companies are Retooling Big Data