Follow Datanami:
April 1, 2013

MapR Turns to Ubuntu in Bid to Increase Footprint

Isaac Lopez

Last week, MapR and Canonical announced a partnership to make the MapR M3 free community edition of Hadoop available to installations of the 12.04 LTS and 12.10 releases of the Ubuntu Linux OS. In the same stroke, MapR announced that it has released the entire M3 package on GitHub for developers to make modifications as they wish to their version of the Apache Hadoop platform.

Canonical, which touts its ease of use for big data implementations, gains access to an enterprise grade Hadoop solution, giving their OS more bite to go along with their big data bark. Through Juju, Canonical’s package management system, pre-configured deployment modules called “charms” are available to help developers deploy MapR M3 into both private and public clouds that have standardized on OpenStack.

As part of MapR’s drive to increase its footprint, the company has announced that it is turning to GitHub to release the source code for its M3 free community edition Hadoop software (and all of its artifacts, including Apache Hadoop, Hive, Pig, Oozie, Flume, Hbase, Sqoop, Mahout, Whirr, Cascading, and Hcatalog). MapR’s hope is that they’re creating an easy Hadoop on-ramp for developers who use Ubuntu and the attached distribution of OpenStack to administer their clouds.

But isn’t MapR known as proprietary vendor who adds value to their Hadoop distribution through their special blend of proprietary secret sauce for the platform? While that may be the perception, it’s not the reality, claims MapR marketing VP, Jack Norris, who says that now is the right time to open their free platform up to innovators.

“If you look at where Hadoop is on the product lifecycle, it’s still fairly early,” said Norris. “With Hadoop being at the earlier part of the lifecycle, there’s a real focus on innovation – and how do you add innovations to propel this moving forward?”

MapR’s apparent answer to the rhetorical question is to open source it and make it freely available to any developer inclined to download and install Ubuntu.

“We haven’t been so good in the past at making the source code for all these packages available,” explains Tomer Shiran, director of product management with MapR. “However, what we’re doing now – and especially the way that we’ve done it by putting all of the source code on GitHub so customers can access it – we want to make it really super easy for a developer that wanted to modify one or all of those things. Now a developer, with one command, can clone a version of Hive that MapR is using – they can make changes and then build it and have their own version with all the MapR enhancements.”

That’s something that is unique to MapR, claims Shiran, who says that while Cloudera’s CDH is also touted as a 100% open source Hadoop distribution built for enterprise, it is not well maintained as an open source project. “There are many projects listed on the [GitHub] site, but they are not being maintained, and they do not have the exact code that Cloudera ends up shipping,” notes Shiran, who points to Cloudera’s  GitHub distribution of Hive and Mahout as examples. “You’ll notice that the version of Hive is 0.6 – this is more than a year old,” he adds, noting that Cloudera’s open source version of Mahout was last updated 11 months ago.

How significant the move is for MapR’s bottom line remains to be seen. Some figures contend that Ubuntu is currently the number one platform installation in Amazon’s EC2 cloud (as listed by images). (Note: AWS does not officially list data on OS installations.) W3Techs says that Ubuntu is used by 7.6% of all the websites whose operating systems they know.

Related Items:

MapR Technologies Closes $30 Million in New Funding 

Is Hadoop All Grown Up Now? 

A New Benchmark for Big Data