Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report
Rogue Wave

September 18, 2012

CSAIL Rides Big Data Visualization Wave


 “The hour of big data has arrived in Massachusetts,” said Massachusetts governor Duval Patrick recently.

When he made this statement in a speech on MIT’s campus, Patrick was referring to none other than MIT’s CSAIL, or Computer Science and Artificial Intelligence Laboratory, the leading big data research institution in Massachusetts and one of the biggest in the country.

“(We’re trying to) build a set of software tools,” said Sam Madden, CSAIL’s director, “that allow people to take all these datasets, combine them together, ask questions and run algorithms on top of them that allow them to extract insight.” According to Madden, CSAIL is the biggest interdepartmental laboratory in the country, with 100 professors and professional researchers and 900 students working with it. As Madden explained, a significant portion of that research is being invested in solving big data problems.

A big focus of CSAIL’s is visualization. Visualization is an important aspect of big data analytics, as it allows humans to make better assessments than they would by just looking at numbers. However, it can take an especially long time to produce a graph or chart if the dataset is in the terabyte range. Further, the ensuing graph may be cluttered or unusable.

“I’m working with a dataset that was derived by NASA,” said research assistant Leilani Battle. “The purpose of my work is to take datasets within databases and instead of querying them for table results, querying them for visualizations. So instead of looking at large sets of numbers and tables and what not, you get a picture.”

Among the projects that MIT is working on in the big data world is something they call “sublinear time algorithms.” Frequently, a set of data can be expressed as a single point if that set holds linear or similarly easily mapped properties. MIT’s algorithms could hypothetically determine if certain data has these properties and then compress it.

 A practical and business-friendly example of MIT’s research deals with credit card companies and their monitoring of delinquent accounts. A lot of efficiency and resources are lost in pursuing customers who make late payments but are consistent with their late payments. However, a great deal of that information is held in relational databases and can be therefore easily accessed and analyzed; so long as one knows how to analyze it.

 “We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk…we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies and defaults, with linear regression R2’s of forecasted/realized delinquencies of 85%.”

It takes a lot of very smart people to solve problems such as those posed by big data. MIT has a relatively high concentration of very smart people, and as a result are one of the leaders in big data research.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event