Follow Datanami:
August 14, 2020

MIT Is Developing a Tool for Machine Learning-Powered Data Retrieval

Oliver Peckham

With the global deluge of data, the opportunities are endless – but so are the challenges. Within five years, the world’s data is estimated to reach 175 zettabytes: enough to fill over 23,000 one-terabyte hard drives for every single person alive. In the context of such a data-driven world, managing and sorting through that data is a task that gets harder by the day, with database and query managers struggling to keep up. Now, researchers from MIT are developing a tool to intelligently assist users of large databases.

“It’s like building a database system for every application from scratch, which is not economically feasible with traditional system designs,” explained MIT Professor Tim Kraska in an interview with MIT’s Adam Conner-Simons. Kraska and his colleagues – from the institute’s Computer Science and Artificial Intelligence Laboratory (CSAIL) – are debuting a design for what they call “instance-optimized systems”: database systems that are able to optimize and reorganize themselves in response to the data types and workloads at hand. 

MIT’s instance-optimized system will be the child of two parents: the “Tsunami” and “Bao” tools. Using machine learning, Tsunami (a successor to “Flood”) interprets user queries to reorganize the layouts of databases. Bao, meanwhile, uses machine learning to intelligently pick the appropriate plan for completing a given query. On their own, Tsunami improved query speed up to tenfold, while Bao-created query plans ran up to 50% faster. When combined: the instance-optimized system.

“Query optimizers have been around for years, but they often make mistakes, and usually they don’t learn from them. That’s where we feel that our system can make key breakthroughs, as it can quickly learn for the given data and workload what query plans to use and which ones to avoid,” Kraska said. “Our hope is that a system like this will enable much faster query times, and that people will be able to answer questions they hadn’t been able to answer before.”

The team is still working to integrate the two tools, but are already having luck training Bao, with the tool outperforming commercial tools with as little as one hour of training. The researchers are hoping to bring this success, and others, to resource-limited systems like cloud environments where query optimization could have a particularly large impact.

“I think this line of work is a paradigm shift that’s going to impact system design long-term,” says Idreos. “I expect approaches based on models will be one of the core components at the heart of a new wave of adaptive systems.”

To read more, check out the paper here or the MIT news story here.