November 27, 2013

Hadoop on a Raspberry Pi

Isaac Lopez

Looking for a fun side project this winter? Jamie Whitehorn has an idea for you. He put Hadoop on a cluster of Raspberry Pi mini-computers. Sound ridiculous? For a student trying to learn Hadoop, it could be ridiculously cool.

For those who don’t know what a Raspberry Pi is, think of it as a computer on a credit card meets Legos. They’re little chunks of computing technology, complete with a Linux operating system, a 700MHz ARM11 processor, a low-power video processor and up to 512MB of Memory. Tinkerers can use it as the computing brains behind any number of applications that they design to their heart’s content. In a recent example, a Raspberry Pi enthusiast built a Raspberry Pi mini PC, which he used to control a mini CNC Laser engraver made out of an old set of salvaged DVD drives and $10 dollars in parts of eBay. Ideas range from building a web server, a weather station, home automation systems, mini arcades – the list of projects is endless.

At the Strata + Hadoop World conference last month, Jamie Whitehorn shared his Hadoop Raspberry Pi creation with an audience. He discussed the challenges a student has in learning the Hadoop system. Chiefly, it’s a distributed architecture that requires multiple computers to operate. Someone looking to build Hadoop skills in a test environment would need several machines, and quite an electricity bill to get a cluster up – a prospect that can be very expensive for a student.

Whitehorn makes the point that while it’s true that this can all be avoided using a Hadoop cloud service, he says that defeats the point, which is understanding the interaction between the software and the hardware. The whole point of the exercise, he explains, is to face the complexity of the project and overcome it.

And it’s no simple feat, says Whitehorn, channeling Douglas Adams, he describes the complexity involved in installing Hadoop 2.2 on a resource limited, ARM computer: “It’s like scaling the north face of the Megapurna with a perfectly health finger but everything else sprained, broken or bitten off by a pack of mad yaks.”

Whitehorn gives a list of the limitations that someone attempting to take on the Raspberry Pi/Hadoop challenge will face, including:

  • JAVA – Whitehorn notes that the current version of JAVA in the latest Rasberry Pi release is 1.7. “You can run it as a client,” he says, “but you have to take out the server bits in it – a big learning experience as to how it works.”
  • Memory – The Raspberry Pi has only 512MB of memory, so Hadoop must be sliced down so that it only uses 200-300 MB. Whitehorn says that 272 MB works well.
  • Performance – As one would expect, this is not a production machine ready to blaze trails through big data. Whitehorn notes that there are details that need to be worked around, such as the USB and the network using the same bits of hardware.

“All of this is great for learning,” says Whitehorn, who says the good news is that it works, showing off his five-node Raspberry Pi Hadoop cluster. While Whitehorn says the project is a lot of fun to work on (think of it like a model car kit without the glue and X-Acto knives), they’re not for production.  “A lot of people want to, because they think that it’s going to be cheap,” he says, but warns, “it runs like a comatose snail, but it’s absolutely brilliant for learning.”

Whitehorn says that he’s learned a lot about Hadoop from attempting the project, and encourages others to get in on the action. For anyone who is interested in doing that, he has posted a blog entry that discusses his approach and some of the nuances that can be found here.

Information on the Raspberry Pi can be found at www.raspberrypi.org. The Raspberry Pi Foundation, which pushes these little units out for the purposes of education projects like this one, last week announced that it has crossed the two million mark for units sold.

Related items:

Big Data’s Role in Civilization 3.0 

Report: Big Data Trickling Into Retail 

Powering the Internet of Things With Light