Follow Datanami:
December 21, 2011

GPUs Push Envelope on BI Performance

Nicole Hemsoth

Two trends in enterprise IT are taking firmer shape and converging; each from their own distinct pockets of development.

On the one hand are the new generations of business intelligence applications with algorithms unparalleled in complexity and requiring near real-time processing across datasets that can range in the petabyte arena.

On the other side, the hardware end to be exact, processing techniques that exploit the advantages of massively parallel data handling with an eye on both performance and efficiency are cropping up–namely, the use of accelerators and hybrid architectures, including the use of GPUs.

It’s hard to argue that there isn’t is a GPU revolution sweeping the world of high performance computing, especially in the last few years, which have brought a parade of Top500 supers loaded to the gills with graphics coprocessors. Despite the increasing presence of GPU computing in HPC, conversations about how a more mainstream host of applications and the potential benefit of GPU acceleration are sometimes overlooked.

This year in Beijing, at GTC Asia, NVIDIA invited Ren Wu from HP Labs to present on this “other world” of applications that are making GPU computing appealing for a larger set of enterprise customers. These are business intelligence and enterprise analytics applications and according to Wu, this wide pool of applications can find notable speedups across their big data BI operations with heterogeneous (GPU and CPU) systems—even with large datasets that cannot fit onto the on-board memory for the GPU.

According to Wu and his colleagues stated in the original research piece, which emerged back in 2009 and became a staple to sate the “big data” camp at GTC events: “While massively parallel data management systems have been used to scale the data management capacity, BI analytics and data integration analytics have increasingly become a bottleneck, and can only get worse. High performance support for these analytics is both a great challenge and a great opportunity.”

Wu’s presentation was based on his research in conjunction with other team members at HP Labs to examine what BI and big data benefits might lie in GPUs. They note that while benchmarking on similar applications has been done in the past, previous works showed that GPUs were only able to accelerate a handful of general purpose applications with “respectable” performance gains.

The research team used the K-Means clustering algorithm to test the ability of GPUs to accelerate across large datasets that were still able to be handled within the GPU’s on-board memory. They said that for these, “the GPU-accelerated version is 6-12x faster than our highly optimized CPU-only version running on an 8-core workstation, or 200-400x faster than the popular benchmark program, MineBench running on a single core.”

However, for the datasets that were too big to fit on-board, there were also some rather striking results. They claim that with a heterogeneous environment of CPU and GPU, “as well as data transfers between them, to proceed in parallel, the GPU accelerated version can still offer a dramatic performance boost.” To put this in context, they saw that for a data set with 100 million 2-d data points and 2,000 clusters, the GPU-accelerated version took about 6 minutes, while the CPU-only version running on an 8-core workstation took about 58 minutes.

The verdict? As Wu and colleagues state: “compared to other approaches, GPU-accelerated implementations of analytics potentially provide better raw performance, better cost-performance ratios and better energy performance ratios.”

You can learn more about the benchmarking and research effort in the original paper that inspired Wu’s talk during GTC Asia here or watch a full presentation of the materials from a similar presentation in 2010.

Additionally, for those with an eye on speeding up enterprise analytics applications there are a few other recorded presentations that discuss the role of GPUs in big data contained in the following:

Integrating CUDA into a Large-Scale Commercial Database Management System

Accelerating Business Intelligence Applications with Fast Multidimensional Aggregation

Speculative Query Processing

Large-Scale Text Mining on the GPU

You Might Also Like: A Multi-GPU Recommendation System

Datanami