Follow Datanami:
March 13, 2023

MIT Researchers Use Machine Learning to Speed Up Data Retrieval Hashing

A multi-institutional team of researchers led by MIT has found a new way to speed up data retrieval in large databases using machine learning.

The researchers used machine learning to build better hash functions. Hashing is a core operation used in online databases to accelerate data retrieval using hash functions that generate code to identify where data is stored.

A problem with hash functions is that they generate codes at random, and two pieces of data are sometimes hashed with the same value, causing what is called a collision. Collisions occur when multiple data are indicated with the same hash value, leading to less efficient searches. While there are specific kinds of hash functions designed to lessen collisions, they are laborious and require more time to write.

To reduce collisions for certain cases, the research team trained machine learning models created by running an algorithm on a dataset to capture specific characteristics, according to an article from MIT News. The team found that these models were more computationally efficient than other hash function types.

“What we found in this work is that in some situations we can come up with a better tradeoff between the computation of the hash function and the collisions we will face. In these situations, the computation time for the hash function can be increased a bit, but at the same time its collisions can be reduced very significantly,” said Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL), in an MIT News article.

The research team says it wants to use machine learning models to design hash functions for other types of data and plans to explore learned hashing for databases in which data can be inserted or deleted, says MIT News.

“We want to encourage the community to use machine learning inside more fundamental data structures and algorithms. Any kind of core data structure presents us with an opportunity to use machine learning to capture data properties and get better performance. There is still a lot we can explore,” Sabek said.

To read more about the technical specifics of this new hashing method, read Adam Zewe’s coverage for MIT News at this link, and read the scientific paper here.

Related Items:

MIT and Databricks Report Finds Data Management Key to Scaling AI

Algolia Acquires Search.io to Enable Users to ‘Search As They Think’

MIT Researchers Tackle Time Series Anomalies with Generative Adversarial Networks

Datanami