Follow Datanami:
April 4, 2014

How to Speed Your Data Warehouse by 148x

Alex Woodie

Data analytic applications built on a combination of IBM’s DB2 BLU column-oriented data store and servers equipped with Intel’s latest Xeon E7 v2 processors can run up to 148 times faster than if they were running on the standard row-oriented DB2 engine and first-gen Xeon E7 processors, the companies announced last month.

The big data buddies clocked the colossal speedup in January during an internal test session that utilized the Proof of Performance and Scalability (POPS) benchmark running on a 10TB star-schema database, which is often used in traditional data warehouses.

IBM and Intel broke the 148x figure into constituent parts in a white paper released in mid-March. The addition of the DB2 database with BLU acceleration to the standard DB2 10.1 database running on Xeon E7 processors was responsible for a 77x boost in performance. Then, moving up to Xeon E7 v2 processors provided a 1.9x performance pop, which gives us the final 148 figure.

IBM and Intel worked together to ensure that the BLU engine can take advantage of the latest parallelized goodies in the multi-core Xeon E7 v2 processors, including Intel Advanced Vector Extensions (AVX) and Streaming SIMD Extensions (SSE) instructions. Jantz Tran, an Intel performance application engineer who has an office at IBM’s labs in Silicon Valley, explained the significance of this approach in a recent blog post on the Intel website.

“Packing columnar data into SSE registers allows you to use memory pools much more efficiently than row-based stores because you can run queries and evaluate data while it is still compressed,” Tran says, “In fact, data compression with columnar store is so much more efficient it requires a lot less memory to run the same data set. So you can house a much larger columnar database on a much smaller memory footprint.”

The actionable compression capability, in fact, enabled the 10TB data warehouse used in the POPS test to be 4.55 times smaller than if it were using standard static compression methods. “So if you have 10 TB of raw data and 2 TB of memory, you can run it as an in-memory database using DB2 with BLU Acceleration and Intel Xeon E7 v2 processors,” Tran says.

In other words, you can have your cake (faster query times) and eat it too (less hardware required). “The bottom line: These technologies allow you to run large primary databases directly in-memory at orders-of-magnitude improved performance,” Tran says.

IBM unveiled its BLU Acceleration database option exactly year ago, and shipped it first for AIX on its own Power processors and later for Linux on Intel processors. The column-oriented data store employs a combination of techniques, including in-memory caching, compression, and data skipping, to dramatically accelerate some types of queries. It’s a different technique than Microsoft took with the Hekaton in-memory feature it recently unveiled with SQL Server 2014, in that, while BLU is a separate column store that uses in-memory caching, most of the data remains on disk, whereas Hekaton is enabled in the database proper, and also allows data to be stored either in-memory or on-disk.

The Xeon E7 v2 processors can support up to 1.5 TB of memory per socket, which is three times the amount of memory supported on the first gen Xeon E7 processors. That gives eight-socket E7 v2 chip the capability to support up to 12 TB of memory. As in-memory systems become more affordable and generally accepted in the industry, IBM is poised to capture a share of the market for both transactional and analytical systems.

Related Items:

IBM Takes BLU In-Memory Database to the Cloud

IBM Announces “BLU Acceleration” and PureData System for Hadoop

IBM Points to Blueprint for Big Data Analytics Value

Datanami