Follow Datanami:
November 8, 2011

Mellanox Addresses Big Data Acceleration

Datanami Staff

This week high performance networking giant, Mellanox, announced the second version of their Unstructured Data Accelerator (UDA 2.0) to speed big data analytics systems.  The company claims that users could realize 2 times the performance boost, enabling them to respond to the real-time demands that are the foundations for many businesses.

In essence, the goal is to improve node efficiency on Hadoop clusters by handling data with RDMA and refining the merge-sort algorithm. They company claims that this will lower the per-node execution time and maintains the scalability needed for big data analytics users. Mellanox has named itself first in the race to produce a software-based network solution that can address some of the major performance bottlenecks in Hadoop clusters by addressing node bandwidth and compute efficiency.

The unstructured data accelerator will not require any mucking around inside the application or usage models. The acceleration is handled via a middleware interface that connects directly to the company’s own InfiniBand or Ethernet products. Mellanox points to the ease of use of the solution, noting there there is no special tweaking required; that it needs only to be a factor for cluster admins when they are handling the initial configuration.

The company says that this offering expands “existing I/O capabilities of low latency, high-throughput, low CPU overhead and Remote Direct Memory Access (RDMA) into the big data software infrastructure for optimizing big data applications efficiently.”

With roots in the high performance computing market, the verticals Mellanox is going after here are likely already familiar with their Ethernet and InfiniBand solutions. Many of these same users are turning to emerging frameworks like Hadoop to manage their ever-complicated workflows, but Mellanox says that traditional Ethernet is no longer capable of withstanding the performance demands of many Hadoop clusters (most of which use TCP/IP over one or more GbE network cards hooked through a GbE fabric, which they say can only get uses 125 MB/s of bandwidth per port.” With the coming era of multicore processing technologies, this will become even more of a pressing issue.

According to a detailed statement about UDA, “To achieve highest efficiency with these servers there must be enough bandwidth available for each with CPU offloads that prevent data movement from overwhelming the server’s CPU. High bandwidth technologies InfiniBand and Ethernet deliver up to 40 Gb/s of bandwidth, and each have RDMA (Remote Direct Memory Access) capabilities to offload data movement. However, to utilize RDMA Hadoop needs a special interface to the network card driver.”

UDA will speed the Hadoop network to allow more efficient scaling of Hadoop clusters as they crunch data-intensive applications. The company points to their secret, which lies in “a novel data moving protocol which uses RDMA in combination with an efficient merge-sort algorithm enables Hadoop clusters based on Mellanox InfiniBand and 10GbE RoCE (RDMA over Converged Ethernet) adapter cards to efficiently move data between servers accelerating the Hadoop framework.”

The company has provided some additional details in its solution brief on the UDA technology.