Follow Datanami:
September 23, 2013

CERN Turns to Google for Datacenter Direction

Isaac Lopez

Where does one of the world’s largest and most respected scientific research centers go for inspiration when realizing their datacenter and tools aren’t able to scale to meet their increasing data needs? Why, Google, of course, explained Tim Bell, the Infrastructure Manager for CERN.

With a datacenter built in the 1970’s, CERN, the European Organization for Nuclear Research is currently experiencing what Bell called “the classic problems of traditional datacenters.” The research organization, which is currently gaining a lot of attention for the Large Hadron Collider (LHC), is growing more data than it can handle, and the situation is forcing the organization to come to grips with what might be seen as an embarrassing truth as they plan their newest datacenter.

“One of the fundamental transitions that we went through was to turn around and say that CERN computing is not special,” explained Bell to an audience at the GigaOM Structure: Europe event last week. “Where we find things that we think that we’re special on, fundamentally it’s because we’ve failed to understand the concepts, not because we actually are special.”

The organization, which is currently building a brand new 3 megawatt datacenter in Hungary turned to Google as their model for how to manage and process the enormous amounts of data being pressed by the massive LHC particle physics research project. “In the compute area, Google is way ahead of us in terms of scale,” leveled Bell. “We [plan] to build on what they’ve done, rather than having to invent everything ourselves.”

In response to these realizations, Bell says that CERN is adopting a Google published tool chain, which Bell says will allow them to break up their IT infrastructure services into a series of smaller components.

Bell specifically called out CERN’s use of Puppet, an open source system backed by Puppet Labs, for their configuration management needs. Additionally he says that CERN will use OpenStack as an orchestration engine.

The amount of data that CERN has to accommodate as they design their new center is virtually unfathomable. As the researchers collide particles into each other, 7,000 ton instruments that can be thought of as 100 megapixel digital cameras the size of the Notre Dame Cathedral capture 40 million pictures a second. That’s a petabyte of data generated every second that needs to be processed and analyzed. Bell says that while they do have server farms to knock those numbers down to reasonable levels, in the end, they’re still dealing with 35 petabytes a year that they have to record, and with the upgrades coming, they’re expecting that to double.

“The physicists want to keep this data for 20 years,” he adds. “That means when we add that all up, we’re heading for exabytes fairly soon.”

Thanks to the pioneer work of Google, Bell says they’re well on the way of addressing the technical challenges, but will still need to address the people side of the issue, where open source solutions don’t necessarily apply. “Clearly doing this kind of transition is not just a technology and software problem.”

While CERN has their challenges laid out for them, fans of the research center have a chance to learn more, and maybe even tell them how to solve their problems. The organization is opening itself up this month on September 28th and 29th for CERN Open Days, giving science enthusiasts the opportunity to see the LCH and other projects going on at CERN first hand.

Related items:

Sverre Jarp: The Intersection of Big Data, Enterprise, and HPC 

Interplay between Research, Big Data & HPC 

Big Science and the Gift that Keeps on Giving 

Datanami