Facebook’s Ambitious Global Big Data Program
What’s cooler than millions of Facebook users? Billions of Facebook users. Or so supposes a report released this week by Facebook’s Internet.org group. Officially launched late last month, Internet.org is a group that combines the powers of Facebook, Ericsson, MediaTek, Nokia, Opera, Qualcomm, and Samsung toward the ideal of making Internet access available “to the next five billion people.”
Whether you see it as a benevolent plan to bring much needed information and communication resources to the third world, or a plot to add billions of new users into Facebook’s data network, the Internet.org group says that their goal is to build an infrastructure that will sustainably provide affordable access to basic Internet services in a way that enables everyone with a phone to get online.
“While the current global cost of delivering data is on the order of 100 times too expensive for this to be economically feasible, we believe that with an organized effort, it is reasonable to expect the overall efficiency of delivering data to increase by 100x in the next 5-10 years,” said the group on a whitepaper released this week. The solution to getting there, according to the report, will involve bringing down the underlying costs of delivering data, and reducing the amount of data used by building more efficient apps.
While it’s easy to be skeptical of a company’s intentions as they involve themselves in initiatives that would clearly grow their base, the information presented in the white paper will likely be welcome reading by CIOs everywhere. In the report, Facebook, who has been a notable pioneer in building efficiency in their datacenter infrastructure, shares strategies for companies to reduce the costs associated with their IT infrastructure, while getting more out of it.
In the document Facebook shared how they used HipHop for PHP and the HipHop Virtual Machine to run 500% more traffic on the same number of servers. They also level with readers on such topics as their use of specialized code, including Apache Giraph, the open source graph processing framework. Knowing that software is just part of the challenge, Facebook also discusses the success they’ve had with the Open Compute Project, and how companies can build servers from the ground up that aim to serve traffic as efficiently as possible.
For the Hadoop-heads out there, Facebook discusses the new platform that they created which they say allows the running of a Hadoop cluster across multiple data centers by creating multiple namespaces that then create many logical clusters on top of the same bigger physical cluster. According to the whitepaper, Name spaces can be divided, but they all have a common dataset that can span multiple data centers, allowing the teams to move data between all of Facebook’s data centers.
Facebook says that while Apache Hadoop was initially employed as the foundation of their infrastructure, they began to bump up against the limits of the system in early 2011. “Because of our unprecedented scale, we’ve had to innovate on top of Hadoop and MapReduce and have created several new tools, including Corona, AvatarNode, Giraph, Presto, and Morse.”
While the whitepaper goes into detail on Facebook’s views and examples of building efficient data centers, it also gave floor time to Qualcomm and Ericsson to introduce the “1000x initiative,” a plan to expand global wireless capacity by 1000 times.
While the plan to deliver the Internet to the rest of the world will be met with healthy amounts skepticism, there’s no denying that delivering data is expensive and only promises to get more expensive in a world where data is expected to explode year-over-year. Whatever Facebook’s motives are, it’s clear that the industry on the whole can benefit by reducing their costs while increasing their efficiency.