Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan

November 27, 2013

Hadoop on a Raspberry Pi

Looking for a fun side project this winter? Jamie Whitehorn has an idea for you. He put Hadoop on a cluster of Raspberry Pi mini-computers. Sound ridiculous? For a student trying to learn Hadoop, it could be ridiculously cool.

For those who don’t know what a Raspberry Pi is, think of it as a computer on a credit card meets Legos. They’re little chunks of computing technology, complete with a Linux operating system, a 700MHz ARM11 processor, a low-power video processor and up to 512MB of Memory. Tinkerers can use it as the computing brains behind any number of applications that they design to their heart’s content. In a recent example, a Raspberry Pi enthusiast built a Raspberry Pi mini PC, which he used to control a mini CNC Laser engraver made out of an old set of salvaged DVD drives and $10 dollars in parts of eBay. Ideas range from building a web server, a weather station, home automation systems, mini arcades – the list of projects is endless.

At the Strata + Hadoop World conference last month, Jamie Whitehorn shared his Hadoop Raspberry Pi creation with an audience. He discussed the challenges a student has in learning the Hadoop system. Chiefly, it’s a distributed architecture that requires multiple computers to operate. Someone looking to build Hadoop skills in a test environment would need several machines, and quite an electricity bill to get a cluster up – a prospect that can be very expensive for a student.

Whitehorn makes the point that while it’s true that this can all be avoided using a Hadoop cloud service, he says that defeats the point, which is understanding the interaction between the software and the hardware. The whole point of the exercise, he explains, is to face the complexity of the project and overcome it.

And it’s no simple feat, says Whitehorn, channeling Douglas Adams, he describes the complexity involved in installing Hadoop 2.2 on a resource limited, ARM computer: “It’s like scaling the north face of the Megapurna with a perfectly health finger but everything else sprained, broken or bitten off by a pack of mad yaks.”

Whitehorn gives a list of the limitations that someone attempting to take on the Raspberry Pi/Hadoop challenge will face, including:

  • JAVA – Whitehorn notes that the current version of JAVA in the latest Rasberry Pi release is 1.7. “You can run it as a client,” he says, “but you have to take out the server bits in it – a big learning experience as to how it works.”
  • Memory – The Raspberry Pi has only 512MB of memory, so Hadoop must be sliced down so that it only uses 200-300 MB. Whitehorn says that 272 MB works well.
  • Performance – As one would expect, this is not a production machine ready to blaze trails through big data. Whitehorn notes that there are details that need to be worked around, such as the USB and the network using the same bits of hardware.

“All of this is great for learning,” says Whitehorn, who says the good news is that it works, showing off his five-node Raspberry Pi Hadoop cluster. While Whitehorn says the project is a lot of fun to work on (think of it like a model car kit without the glue and X-Acto knives), they’re not for production.  “A lot of people want to, because they think that it’s going to be cheap,” he says, but warns, “it runs like a comatose snail, but it’s absolutely brilliant for learning.”

Whitehorn says that he’s learned a lot about Hadoop from attempting the project, and encourages others to get in on the action. For anyone who is interested in doing that, he has posted a blog entry that discusses his approach and some of the nuances that can be found here.

Information on the Raspberry Pi can be found at The Raspberry Pi Foundation, which pushes these little units out for the purposes of education projects like this one, last week announced that it has crossed the two million mark for units sold.

Related items:

Big Data's Role in Civilization 3.0 

Report: Big Data Trickling Into Retail 

Powering the Internet of Things With Light 

Share Options


» Subscribe to our weekly e-newsletter


There are 0 discussion items posted.


Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014

» View/Search Events

» Post an Event