February 27, 2012

InfiniBand to Shoulder Next Generation Big Data Burden

Gilad Shainer

With the increasing usage of social media websites, geographical positioning, web communication and cloud for content storage and delivery, there is a real need for scalable, higher bandwidth and faster response network fabric in a website’s data centers.

For server and storage communications, InfiniBand offers the highest bandwidth (56Gbp/s) and the lowest network latency (lower than 0.7uSec) to these applications, while maintaining the lowest CPU overhead. Moreover, scale-out InfiniBand fabrics provide the most cost-effective option compared to Ethernet or proprietary networks, therefore lowering capital expenses and total cost of ownership.

Several large scale Web 2.0 providers have decided to adopt InfiniBand solutions instead of using more expensive Ethernet options for their next generation data centers. Microsoft Bing Maps is one example for such a case, in which the InfiniBand solutions dramatically reduced Microsoft’s interconnect CAPEX and OPEX.

Web 2.0 is the era where software itself matters little, but the services that are delivered over the web do. It is the era where the network becomes the computer. Similar to how the PC replaced the mainframe in the 1980s, today the network is replacing the PC. 175,000 books were published and more than 30,000 music albums were released in the US last year alone. At the same time, 14,000,000 blogs launched worldwide and all of these numbers are increasing each year. In the near future, every person will write a song or author a book, create a blog, make a video or code a program.

But the real transformation is still ahead of us. More content will be created, far more than what can be consumed, and the actual creation of content will become encompassed in the consumption. Web 2.0 can be viewed as the operating system of a mega-computer that encompasses the Internet, all its services, all peripheral and affiliated devices from scanners to satellites, and the billions of humans entangled in this global network.

Today the Web 2.0 mega-computer processes 1 million emails and 1 million web searches each second. In every second, 10 terabits are been sent through its backbone, and each year it generates nearly 20 exabytes of data. Its distributed network spans 1 billion active PCs, which is approximately the number of transistors in one PC. This mega-computer is housed in multiple physical data centers forming a giant compute clouds for its users. As Sun’s John Gage said in 1988, “the network is the computer” referring thin clients replacing the mainframe, the network is the Web 2.0 – both for the creating and consumption of the data, and both for the connection between and within the actual data centers forming the backbone of the mega-computer.

Scalability is the New Game

Web 2.0 applications depend on managing, understanding, and responding to massive amounts of user-generated data in real time. The Web 2.0 data center systems are no more than giant data systems, and connecting it all together is the Web 2.0 essence. With more users feeding more applications and platforms, the data is no longer growing arithmetically – it is growing exponentially. And when the data is growing, the data centers need to grow as well both in the data capacity and in the speed that the data can be accessible and analyzed, and new parallel concepts needs to be adopted.

The Web 2.0 data centers today consist of parallel infrastructures, both in the hardware configurations (aka clusters of compute and storage) and in the software configuration (for example Hadoop) that are growing rapidly and therefore required the most scalable, energy efficiency, high performing interconnect infrastructure.

The network is the Web 2.0. A true statement for the collection of the contributions that create the web 2.0 database (Amazon, Google, eBay, Flickr, Wikipedia, Twitter, YouTube, Facebook, Blogger, Craigslist etc.) and a true statement for the actual interconnect connecting all of the servers and storage elements in the data centers. The fast scaling-up in the data centers requires the data center interconnect to scale as fast – ability to build flat giant network, ability to protect the data and to deliver is a quickly as possible, to minimize the overhead, and of course to be very low power and very cost effective.

World Wide Web 2.0

The Web 2.0 Data Center Network

The two options that offer data center connectivity are Ethernet and InfiniBand. Fibre Channel is a storage only option, and therefore has no play in the next generation Web 2.0 mega-centers where the main focus is on consolidation, energy efficiency and scalability. PCI Express switching is actually not a network and does not scale, therefore not a viable option for scalable, easy to manage and efficient data centers.

Ethernet’s main usage is in the traditional data center area, where complex management and layered connectivity is not a bad thing. Ethernet requires supporting backward compatibility to decades ago, and its architecture is layered – a top of rack, core and aggregation switch fabric. While this is an ok match for a dedicated data center (virtual or not), for a fast growing and scalable Web 2.0 infrastructure, this is more of a challenge.

InfiniBand started in the high-performance computing arena due to its performance and agility. It is not the latency, throughput, or its transport (that does not requires lots of CPU power), but the ability to build unlimited-sized flat networks based on the same switch components, the capability to ensure lossless and reliable delivery of data, and its capability of congestion management and therefore support for shallow buffers that have brought InfiniBand to the front line of high-performance networks, and a good candidate for the Web 2.0 data centers.

InfiniBand Network Topology

The InfiniBand Architecture (IBA) is an industry-standard architecture for server and storage I/O and inter-server communication. It was developed by the InfiniBand Trade Association (IBTA) to provide the levels of reliability, availability, performance, and scalability necessary for present and future server systems, levels significantly better than can be achieved other I/O structures. InfiniBand fabrics are created with host channel adapters (HCA or NICs) that fit into servers and storage nodes and are interconnected by switches that tie all nodes together over a high-performance network fabric, same as any other interconnect technology.

The InfiniBand Architecture is a fabric designed to meet the following needs: high bandwidth, low-latency computing, storage and management over a single fabric, cost-effective silicon and system implementations with an architecture that easily scales from generation to generation, highly reliable, available and scalable to tens and hundreds-of-thousands of nodes and exceptionally efficient utilization of compute processing resources.

InfiniBand drivers are part of any started OS distribution. Both InfiniBand and Ethernet serve the same applications interfaces (InfiniBand actually serves more types of applications), and managing of an InfiniBand fabric is as managing Ethernet fabric. InfiniBand, as Ethernet, enables I/O consolidation meaning running compute, storage and management on the same fabric.

The basic building blocks of the InfiniBand network are the switches (ranging from 36-port to 648-port in a single enclosure) and the gateways from InfiniBand to Ethernet (10GbE or 40GbE). The gateway allows efficient connectivity of the data center to the Web 2.0 user community.  The InfiniBand switch fabric runs at 56Gb/s speed, allowing flexible configurations and oversubscription in cases where the throughput to the server can be lower. Similar to how an Ethernet fabric is being managed, and how an application runs on top of an Ethernet NIC, is the way that an InfiniBand fabric is being managed, and an application runs on top of an InfiniBand adapter.

InfiniBand is a lossless fabric that does not suffer from the spanning tree problems of Ethernet. Scaling made easy with the ability to add simple switch elements and grow the network to 40 thousand server and storage endpoints in a single subnet and to 2^128 (~3.4e+38) endpoints in a full fabric. InfiniBand adapters consume extremely low power of less than 0.1 watt per gigabit, and InfiniBand switches less than 0.03 watts per gigabit. As InfiniBand competes with Ethernet, InfiniBand pricing is competitive with Ethernet, and the higher throughput enables the lowest cost per end point. 

InfiniBand is the Web 2.0 Network at Bing Maps

On the grounds of a Microsoft facility in Boulder, Colorado lays the new Bing Maps data center. This Web 2.0 data center supports street side, bird’s eye, aerial and satellite image types provided by Bing Maps. The Bing Imagery Technologies team takes all of these data and stitch them together to form visual mosaics for Bing Maps. The data increases rapidly as more details are being added, and more information is required or provided by the users.

 Microsoft Bing Maps Datacenter

Given its compute and data intensive mission, and the large scale of its operations, Bing Maps requires large amounts of processing power, data storage capacity, and memory. It measures its storage needs in petabytes, its memory needs in terabytes, and its processing needs in tens of thousands of cores. Bing Maps did explore the different interconnect options for the new data center, and decided to use the InfiniBand option.

The basic building block element of the Bing Maps interconnect architecture includes multiple racks of servers from Dell, each connected to a Mellanox QDR InfiniBand switch fabric. As the network delivers 40Gb/s of throughput, an oversubscribe topology was used to provide nearly 10Gb/s throughput to the server side. The ease and flexibility of the InfiniBand fabric allows any kind of oversubscription, simply done by using more ports per switch for server connectivity and less to the switch fabric. At the edge of the fabric, several InfiniBand to Ethernet gateways were placed, serving dual functionality – network switching and connectivity to Ethernet to connect outside of the data center.

At the time of installation, the cost difference between the InfiniBand solution and the potential Ethernet option was 2X – the InfiniBand option was 50% more cost effective. While pricing is a dynamic element, figures are being updated from time-to-time. Nevertheless, the main expense in the interconnect topology is the switch fabric, and with higher throughput, the more economical the switch fabric will be. In particular to InfiniBand, the ability to use the same switch elements with no need for special aggregation layers delivers the big difference.  

Microsoft Bing Maps Data Center Architecture

Summary

Microsoft Bing is one example of new Web 2.0 data centers built based on an InfiniBand topology. The technology elements, the pricing advantage and the low power for gigabit of data transfer enables next generation Web 2.0 infrastructures to be more cost effective, less power, and quickly scale up.

In the next few years, the Web 2.0 mega-computer will process and move data in quantities never seen before. In 10 years, the Web 2.0 mega-computer will contain hundreds of millions of content creating elements – environmental sensors, satellite cameras, guiding cars, and every living person. We will live inside this thing, and this thing must not consume our world resources. The interconnect technology that will be used for the largest scale supercomputers, will be used for the Web 2.0 mega-computer. 

Editor’s Note: Additional authors on this article include Eyal Gutkind, Eli Karpilovski, Motti Beck, Brian Sparks, all from Mellanox Technologies.

Related Stories

Mellanox Bridges Network Performance Divide

Mellanox Addresses Big Data Acceleration

Solarflare Partnership Takes Aim at I/O