Follow Datanami:
October 22, 2012

A Strong ARM for Big Data

Burgeoning data growth is one of the foremost challenges facing IT and businesses today. Multiple analyst groups, including Gartner, have reported that information volume is growing at a minimum rate of 59 percent annually. At the same time, companies increasingly are mining this data for invaluable business insight that can give them a competitive advantage.

The challenge the industry struggles with is figuring out how to build cost-effective infrastructures so data scientists can derive these insights for their organizations to make timely, more intelligent decisions. As data volumes continue their explosive growth and algorithms to analyze and visualize that data become more optimized, something must give.

Past approaches that primarily relied on using faster, larger systems just are not able to keep pace. There is a need to scale-out, instead of scaling-up, to help in managing and understanding Big Data. As a result, this has focused new attention on different technologies such as in-memory databases, I/O virtualization, high-speed interconnects, and software frameworks such as Hadoop.

To take full advantage of these network and software innovations requires re-examining strategies for compute hardware. For maximum performance, a well-balanced infrastructure based on densely packed, power-efficient processors coupled with fast network interconnects is needed. This approach will help unlock applications and open new opportunities in business and high performance computing (HPC).

Scale-out Challenges and Solutions

HPC workflows are dependent on a balance between hardware, software, and network performance. Improvements in one area might speed some simulations or analysis routines. But overall, bottlenecks may simply shift to other elements. Because of this, IT managers are taking a fresh look at how they architect data centers and how to move towards distributed or so-called scale-out designs. In such designs, capacity is increased by adding more servers rather than replacing existing systems with more powerful computing resources.

As more computing resources become available, workloads are dispersed across multiple servers and clusters. This trend has evolved hand-in-hand with software frameworks, such as Hadoop and Sector/Sphere, for distributed computing.

However, scaling-out brings new problems. The traditional approach to systems architecture – of assembling individual discrete components – can result in compounding cost, complexity, and overhead. Additionally, networking requirements are often a sore spot in distributed computing. Inefficiencies inherent in popular software frameworks and the high cost of traditional x86-server blades compound matters.

This had led innovators to look back to the hardware for performance gains. While server processors used in scale-out computing continue to improve, there are still gaps between the capabilities of industry standard options today and the growing requirements of large-scale distributed computing networks. With scale-out platforms, data centers wrestle with their ability to easily add more computing resources given real and tangible constraints such as fixed power, cooling costs, and crowded floor space.

As power efficiency becomes more of a forefront topic in the industry, alternative architectures, such as ARM, are gaining traction as a means to reduce power consumption and cooling costs, while providing more computing throughput. ARM processors are commonly found in embedded systems, smartphones, and other popular consumer electronics. Many organizations are discovering that the same characteristics that make ARM popular in the consumer market – low power consumption and density – also make them well suited for scale-out computing. Their power consumption characteristics, coupled with the ability to move information efficiently between processors, over the network, and from/to memory and storage offers a more balanced design approach that addresses the aforementioned technical problems.

Well-architected distributed computing systems are also extremely reliant upon the network performing perfectly. A bad switch, buggy NIC drivers, or insufficiently provisioned bandwidth can all lead to nightmarish performance. Instead of brute forced gains from ever-faster clock speeds, scale-out computing places a premium on eliminating bottlenecks. High performing x86-based platforms can be starved when coupled with relatively slower subsystems. Just imagine trying to drive a high-end sports car at its maximum potential speed down a dirt road. You’ll surely be left with plenty of horsepower that will go unused given the bumpy and unpaved ground. While some specialty vendors are attempting to address these issues through new innovative interconnects, the solutions are often significantly more expensive.

The acceptance of ARM processors in the data center is part of a larger trend towards “microserver” computing, something that HPC Wire says “has all the earmarks of a disruptive market shift.” HPC Wire reasons that microservers more closely align with the workloads and energy profiles that are desirable in today’s datacenters. “A near insatiable demand for Web-based serving and content delivery and a plethora of Big Data applications, combined with the escalating costs of power and cooling, has forced CPU makers to rethink their priorities,” according to a recent article in the high performance computing publication.

Microservers built with ARM processors are proving to be extremely scalable when deployed together in server-on-a-chip (SoC) configurations with ultra-fast high bandwidth interconnects for low-latency communication between processors. These designs enable optimized scale-out architectures and eliminate barriers for developers creating new types of high throughput applications by reducing cost and complexity. Smaller profile, energy-efficient hardware can be easily managed in the datacenter, free up space to grow, and addresses today’s workloads such as applications that create operational intelligence through analyzing Big Data.

Delivering Results

Any application that is I/O or memory-bound – rather than compute bound – will benefit from a scale-out ARM-based platform. One example of the type of new opportunities enabled by a scaled-out architecture is a Hadoop application that accelerates mission critical portfolio analysis at an investment bank. Morgan Stanley was faced with growing data complexity and size, but most importantly, needed a scalable solution that was up to the task. It chose Hadoop to scale its portfolio operations.

A traditional database couldn’t handle the petabytes of data that Morgan Stanley was analyzing, and while traditional grid computing paradigms could leverage the world’s fastest CPUs, the real issue was throughput and scalability. High-throughput enables collections of functions to work on a large data set, which in Morgan Stanley’s case was a data set of up to several petabytes of log files. By using a scale-out architectural approach and a distributed framework such as Hadoop, they were able to achieve better scalability at data volumes unobtainable with traditional databases.

Calxeda as your technology partner

Calxeda, the pioneers of the “5 Watt server,” is focused on enabling scale-out architectures suited for software frameworks like Hadoop. The Calxeda EnergyCore SoC is a tightly integrated server that combines the ARM CPU, memory and storage controllers, an integrated fabric switch, and a dedicated management engine, all on a single chip. Unlike traditional x86-based system designs that often lack either enough storage or network bandwidth, every Calxeda EnergyCore processor contains dedicated storage controllers and an 80 Gigabit network switch. While you can outfit an x86-based system with these features, they come at an added cost both in price and power consumption. Calxeda’s design solves both those issues by integrating features like 10Gb Ethernet connectivity directly into the SoC, eliminating expensive networking hardware and tangles of cabling. The net result is a perfectly balanced processor coupled with an abundance of storage and network bandwidth for these emerging scale-out architectures.

The bottom line is that scale-out architectures, anchored by densely packed power-efficient processors, help companies solve big data challenges by delivering an optimal architecture for today’s distributed applications like Hadoop, and solve how the datacenters deal with demands for increasingly higher throughput.

Datanami