September 15, 2016

MIT Programmers Attack Big Data Memory Gap

George Leopold

Among the computing challenges presented by big data is the scattering of unstructured items across huge datasets. Pulling together that data from arbitrary locations in main memory is therefore emerging as a major performance bottleneck in CPUs.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have proposed a solution to the memory “locality” problem with a new programming language called “Milk.” The approach is designed to allow application developers to more efficiently manage memory when crunching scattered data in ever-larger datasets.

The MIT researchers reported at a computing conference this week that common algorithms written in the new programming language ran up to four times faster as those written in existing languages. They predict larger performance gains as the new programming language is refined to orchestrate data locations and determine the relevance of data stored at particular locations.

Along with the scattering of big data in memory, the MIT researchers are also tackling a problem they refer to as “sparse” data. In other words, the scale of big data solutions do not always scale in proportion to big data problem to be solved.

The MIT programming language was predicated on the fact that today’s CPUs are not optimized for “sparse” data. Fetching data sequentially from main memory, a CPU core is designed to grab blocks of data based on its location. The university researchers concluded that accessing main memory for a single data point is woefully inadequate in the age of big data.

“It’s as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton and put it back in the fridge,” explained Vladimir Kiriansky, an MIT doctoral student in electrical engineering and computer science and lead researcher.

(The analogy also explains the name of the new programming language.)

Milk adds several commands to OpenMP, or Open Multi-Processing, a compiler extension to the C and other programming languages geared to multicore processors. Milk allows programmers to add a few lines of code to instructions that repeat through large datasets looking for “sparse” data. The compiler then manages memory accordingly, the researchers said.

By compiling a list of data addresses and grouping addresses near each other in memory, each core requests only the data it needs and can be retrieved efficiently, thereby boosting overall performance.

The next step in boosting performance will be tailoring the Milk compiler to keep track of the list of memory addresses but also data stored at those addresses. The approach would decide which addresses to retain for future reference and which to discard.

“Many important applications today are data-intensive, but unfortunately, the growing gap in performance between memory and CPU means they do not fully utilize current hardware,” noted Matei Zaharia, an assistant professor of computer science at Stanford University. “Milk helps to address this gap by optimizing memory access in common programming constructs.”

Recent items:

Flash Memory Gives Databases a Jolt

Going In-Memory? Consider These Three Things First

Applications: Research Analytics

Technologies: Processors, Storage, Systems

Sectors: Academia

Vendors: MIT

Tags: compilers, CPU, memory address, memory performance, Milk, MIT, multicore processing, Open MP

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

MIT Programmers Attack Big Data Memory Gap

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

MIT Programmers Attack Big Data Memory Gap

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link