April 22, 2015

A Shoebox-Size Data Warehouse Powered by GPUs

Alex Woodie

When it comes to big data, the size of your computer definitely matters. Running SQL queries on 100 TB of data or joining billions of records, after all, requires horsepower. But organizations with big data aspirations are increasingly able to satisfy their data processing needs with much smaller computers, thanks to the dense power boost provided by GPUs.

Graphic processing units were initially adopted by gamers to drive graphics in video games (hence the acronym). As the game developers delivered ever-richer and more life-like visual displays, they increasingly expected their customers to have the multi-thousand-core GPUs from the likes of Nvidia to power the math behind the stunning vector graphics. Without a screaming fast GPU loaded into your gaming PC, the games were slow, pixelated, and totally lame.

At some point, the HPC community dove headlong into the GPU world and began leveraging graphic chips from Nvidia, and then Intel, as processing multipliers to boost the throughput and capacity of supercomputers. The transformation was so complete that most of today’s most powerful supercomputers have as many GPUs as traditional CPUs, and that balance is shifting toward even more Nvidia Teslas and Intel Phis loaded into next-gen supers.

The big data analytics world hasn’t been as quick to latch onto exotic hardware such as GPUs and field programmable gate arrays (FPGAs) as the HPC world, with a few exceptions. Indeed, most of today’s big data software, from Hadoop to various types of NoSQL databases, have gravitated to the standard stack of Linux running on Intel X86 processors. (At least today’s Hadoop does run on modern 64 bit processors.)

But as big data workloads increasingly come up against the processing wall of traditional CPUs architectures, there are rumblings of change within the community, and some entrepreneurs are increasingly looking to GPUs (and FPGAs) for solutions.

One of those startups leveraging GPUs for big data analytics is a company called SQream Technologies. Based in Israel, SQream developed a column-oriented database designed to leverage the power of Nvidia GPUs to run data warehousing workloads with a fraction of the computing hardware than would traditionally be required.

The SQream DB architecture.

According to SQream CEO and founder Ami Gal, customers can get the same analytical performance out of a $35,000 Dell server equipped with GPUs and the SQream DB as they can with a million-dollar server running analytical database from Teradata, Oracle, and IBM. “You don’t need to do data massaging, create cubicles, run indexes, or play with in-memory stuff,” Gal tells Datanami. “What you get is a standard database that can scale to very large numbers on a very small hardware footprint. What you get is a data warehouse in a shoebox.”

Best of all, Gal says, the SQream DB uses the same old ASNSI SQL that millions of analysts are already familiar with–the same SQL that’s generated from your standard business intelligence tools from TIBCO Spotfire, QlikTech, and Tableau Software. “It’s pretty clear to us that most of the big data projects are still done with SQL,” Gal says. “Even if they do it on a Hadoop cluster, they’re trying to find ways to run SQL layers on top of it. For me it’s very clear that SQL is still around, big time. It is in every corner you look.”

The core intellectual property in SQream lies in how the developers leverage the GPUs. The company used Nvidia’s CUDA programming language to parallelize a single SQL queries into thousands of smaller queries that run on each core in the GPU. After the queries run on the 6,000 cores (for a single GPU, or 12,000 for a dual-GPU setup), SQream DB aggregates the results together and presents the result back to the user.

“Our special sauce is how we do the pre-processing, the decompiling,” Gal says. “We actually wrote a complete compiler to do that. We wrote a common database designed specifically for GPUs, for highly parallelized, single instruction, multiple-processor architecture, like the GPUs, from the ground up. It was pretty challenging but this is how we do it.”

Because SQream can bring so many cores to bear on the problem, it doesn’t pay the performance price when massive compression is used, Gal says. Huge joins that would typically take 70 to 80 TB of storage takes less than 1/10 of that in actual storage requirements. Each 2U server running SQream DB can handle about 100 TB of raw data, but use only about 10TB of actual nearly storage.

The company claims its GPU-powered database can run SQL queries up to 100 times faster than traditional data warehouses. “We’re enabling smaller companies, or budget-limited companies, to actually solve big data problems with less money,” Gal says. “And it usually runs much faster than the competition. We’re not in the market for doing stuff in nanoseconds or milliseconds. But if you’re doing a join between 300 billion and 30 million records, it takes you seconds as opposed to hours and minutes coming from the competitors.”

Since SQream DB began shipping in late 2014, it’s been adopted by companies in financial services, cybersecurity, telecommunications, and genomics. Yesterday the company unveiled a new product designed specifically for companies in the genomics field. Called GenomeStack, the software features SQream DB under the covers and is designed to help researchers manipulate and query files stored in the BAM format.

GPUs aren’t turning the big data world on its head, at least not yet anyway. But if solutions such as SQream find traction, you can bet that more customers will be looking to GPUs to accelerate their big data workloads.

GPU-Powered Terrorist Hunter Eyes Commercial Big Data Role

MIT Spinout Exploits GPU Memory for Vast Visualization

Applications: Enterprise Analytics, Security

Technologies: Middleware, Network, Processors

Sectors: Financial Services, Government, Healthcare

Vendors: Dell, IBM, intel, NVdia, Oracle, Qlik, SQream Technologies, Tableau, Teradata, TIBCO

Tags: Genomics, GPU, Nvidia, sql, SQream

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

A Shoebox-Size Data Warehouse Powered by GPUs

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

A Shoebox-Size Data Warehouse Powered by GPUs

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link