Follow Datanami:
August 31, 2017

‘Database Learning’ Aims to Speed Queries


University researchers have developed a lightweight software tool that enables existing databases to learn from user queries, pinpointing requested data without starting from scratch on every query.

A University of Michigan team said theirs’ might be the first working prototype of an emerging “database learning” approach designed to accelerate enterprise and scientific applications currently “mired in a worldwide data glut.”

Targeting big data bottlenecks, the researchers claim the software tool dubbed “Verdict” can deliver answers to database queries up to 200 times faster than traditional databases while maintaining 99 percent accuracy.

The database-learning framework seeks to reduce the time and power wasted on repetitive database tasks, updating the compute-intensive paradigm underpinning current database technology. Explained Barzan Mozafari, a University of Michigan computer science professor: “You submit a query, it does some work and provides an answer. When a new query comes in, it starts over. All the work from previous queries is wasted.”

Verdict uses advanced statistical principles that leverage earlier pairs of questions and answers to infer the likely answer to future queries.

In a paper published earlier this year, the researchers said their aim was to change the current database paradigm through the use of “approximate query processing.” The alternative approach uses SQL functions to delivers results that differ only slightly from the exact result.

Database giants such as Oracle (NYSE: ORCL) also tout the approach as delivering “acceptable” query results while saving processing resources.

“The answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset,” the university researchers noted. “Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data.”

As more queries are processed, a better understanding of the underlying distribution is obtained, they added, yielding “increasingly faster response times for future queries.”

Verdict software is described as a “thin layer” of code that could run on top of existing databases. Once installed, it compiles queries in and out of a database as a “query synopsis.” Once compiled, individual queries are parsed and the resulting “snippets” are used to build a mathematical model of questions and answers. That model is then used to point the database to relevant subsets of data based on new queries.

The researchers claim that in some cases a query can be answered using only the model.

Verdict also attempts to address resource allocation issues as data volumes soar, noting that the small amount of code uses “minimal computing resources” without sacrificing performance. They also noted that the speed and accuracy requirements could be fine-tuned to specific applications.

A commercial product is likely a few years away, but Mozafari asserts that Verdict promises to transform database mechanics. “Instead of just additional work, each query is now an opportunity to learn and make the database work better,” he added.

Recent items:

Hadoop Engines Compete in Comcast Query ‘Smackdown’

The Motivation for Native Graph Databases