January 29, 2014

MIT Spinout Exploits GPU Memory for Vast Visualization

Alex Woodie

An MIT research project turned open source project dubbed the Massively Parallel Database (Map-D) is turning heads for its capability to generate visualizations on the fly from billions of data points. The software—an SQL-based, column-oriented database that runs in the memory of GPUs—can deliver interactive analysis of 10TB datasets with millisecond latencies. For this reason, its creator feels comfortable is calling it “the fastest database in the world.”

Map-D is the brainchild of Todd Mostak, who created the software while taking a class in database development at MIT. By optimizing the database to run in the memory of off-the-shelf graphics processing units (GPUs), Mostak found that he could create a mini supercomputer cluster that offered an order of magnitude better performance than a database running on regular CPUs.

“Map-D is an in-memory column store coded into the onboard memory of GPUs and CPUs,” Mostak said today during Webinar on Map-D. “It’s really designed from the ground up to maximize whatever hardware it’s using, whether it’s running on Intel CPU or Nvidia GPU. It’s optimized to maximize the throughput, meaning if a GPU has this much memory bandwidth, what we really try to do is make sure we’re hitting that memory bandwidth.”

During the webinar, Mostak and Tom Graham, his fellow co-founder of the startup Map-D, demonstrated the technology’s capability to interactively analyze datasets composed of a billion individual records, constituting more than 1TB of data. The demo included a heat map of Twitter posts made from 2010 to the present. Map-D’s “TweetMap” (which the company also demonstrated at the recent SC 2013 conference) runs on eight K40 Tesla GPUs, each with 12 GB of memory, in a single node configuration.

Mostak searched the database of tweets for terms such as “flu.” The results were overlaid on a map of the United States, and then played out over time. Flu-related search hits starting in the south (where the flu made its entry during the 2012 flu season) and progressed into the Northeast. He did the same for tweets related to “snow,” and the hits matched the march of storms across the U.S. “It’s not terribly useful,” he admits,” but it demonstrates the power of the system.”

Graham and Mostak–who previously worked MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL)–are in the process of developing Map-D into a full-fledged business, based on the commercial open source model.


A Map-D heatmap of political campaign donations

The company, which was founded in 2013, is currently working with several clients, including NASA, which is seeking a better visualization engine for analyzing historical ice flows; PayPal, which need a real-time visualization platform to monitor the 3 million-plus data points it generates per second; pharmaceutical giant Novartis, which is looking to speed up interactive pattern matching for drug R&D; Major League Baseball, which is looking for an interactive platform to analyze every pitch made since 1980 for in-game broadcasts and its website; and the U.S. government, which is exploring Map-D for various military applications of GIS and Web mapping platforms.

“We like to say it allows the science of the process to occur at the speed of thought,” Mostak says during the Webinar. “If you have a hypothesis, with a normal system you’d have to test the hypothesis by making a query, wait two hours, make coffee, take a nap, come back. But what we’re doing is you can immediately test your hypothesis, iterate, and basically refine that hypothesis, and test again. So you’re doing scientific process at the speed of thought.”

The product has been tested against Nvidia’s GPUs, and the company is currently working with Intel to get it running reliably against the Intel Phi platform. Map-D is also working on getting the database to run on mobile chips, such as Nvidia Tegra, and on ramping up the scalability. It currently is working on a four-node cluster with 32 GPUs.

Map-D is hoping to capitalize on the need for real-time, interactive analytics platforms that deliver low latency and allow users to act upon the data as it arrives. This pits the software against the analytics elephant in the room–Hadoop. But Mostak has his own thoughts on how analytics can best be delivered.


Map-D creator Todd Mostak

“One thing Hadoop won’t allow you to do is interactive analysis–scanning a billion tweets or scanning political donation records or cell phone records in milliseconds, and being able to visualize it and see patterns and changes,” he says. “A lot of times, when people talk about big data, they think, ‘Oh it has to be petabytes and it has to be running on Hadoop.’ But really what oftentimes big data is, is pushing the limits of what you can do given the size of the data set.”

Also, no indexing. “Basically it’s relying on the raw power of graphics processors to do everything in real time, so you’re not limited to what the person who made the database schema decided to pre-compute or indexed,” Mostak continues. “Map-D doesn’t require indexing. Since it’s doing raw scans, you’re going to get great performance out of the box.”

Regarding the performance claims, Mostak and Graham defend calling Map-D the “fastest database in the world.” “While we think that’s a bit of a big claim, we believe we can back that up by showing you that we’ve been working with the world’s fastest technology, namely Nvidia GPUs,” Graham says at the beginning of the webinar.

Says Mostak: “We can easily claim to be the fastest database in the world, because we’re running the most optimized system on the fastest hardware out there, which currently graphics processor units.”

And that performance will only increase with the coming advances in GPU architectures. “We’re working with scientists at MIT and Nvidia to optimize the database, optimize the GPU kernels,” Mostak says. “We have time on our side. The power of GPUs, the memory bandwidth, the parallelism–it’s all getting much, much faster. In fact GPUs are getting more powerful relative to CPUs. In two to three years, I think Map-D will be even better positioned.”

The roadmap calls for further tweaking the Map-D to support enterprise SQL functions and to support the database running against datasets in the 100 TB range. The product already sports a JSON API, making it useful for sharing information over the Web. The company is also working on machine learning, neural nets, and SVMs (support vector machines), “which all run really well on GPUs,” Mostak says.

This is Your Brain on GPUs

GPUs Push Big Data’s Need for Speed

Applications: Enterprise Analytics, Visualization

Technologies: Middleware, Processors

Sectors: Other

Vendors: NVIDIA, Startups and More...

Tags: GPU, Nvidia, parallelism, sql

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

MIT Spinout Exploits GPU Memory for Vast Visualization

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

MIT Spinout Exploits GPU Memory for Vast Visualization

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link