“The hour of big data has arrived in Massachusetts,” said Massachusetts governor Duval Patrick recently.
When he made this statement in a speech on MIT’s campus, Patrick was referring to none other than MIT’s CSAIL, or Computer Science and Artificial Intelligence Laboratory, the leading big data research institution in Massachusetts and one of the biggest in the country.
“(We’re trying to) build a set of software tools,” said Sam Madden, CSAIL’s director, “that allow people to take all these datasets, combine them together, ask questions and run algorithms on top of them that allow them to extract insight.” According to Madden, CSAIL is the biggest interdepartmental laboratory in the country, with 100 professors and professional researchers and 900 students working with it. As Madden explained, a significant portion of that research is being invested in solving big data problems.
A big focus of CSAIL’s is visualization. Visualization is an important aspect of big data analytics, as it allows humans to make better assessments than they would by just looking at numbers. However, it can take an especially long time to produce a graph or chart if the dataset is in the terabyte range. Further, the ensuing graph may be cluttered or unusable.
“I’m working with a dataset that was derived by NASA,” said research assistant Leilani Battle. “The purpose of my work is to take datasets within databases and instead of querying them for table results, querying them for visualizations. So instead of looking at large sets of numbers and tables and what not, you get a picture.”
Among the projects that MIT is working on in the big data world is something they call “sublinear time algorithms.” Frequently, a set of data can be expressed as a single point if that set holds linear or similarly easily mapped properties. MIT’s algorithms could hypothetically determine if certain data has these properties and then compress it.
A practical and business-friendly example of MIT’s research deals with credit card companies and their monitoring of delinquent accounts. A lot of efficiency and resources are lost in pursuing customers who make late payments but are consistent with their late payments. However, a great deal of that information is held in relational databases and can be therefore easily accessed and analyzed; so long as one knows how to analyze it.
“We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk…we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies and defaults, with linear regression R2’s of forecasted/realized delinquencies of 85%.”
It takes a lot of very smart people to solve problems such as those posed by big data. MIT has a relatively high concentration of very smart people, and as a result are one of the leaders in big data research.