How IBM Is Turning Db2 into an ‘AI Database’
What’s an AI database? If you were to ask IBM executives, they would point you to Db2, the 36-year-old database that’s used by tens of thousands of companies around the world to power transactional and analytical workloads alike. Big Blue now senses an opportunity remake Db2 into an AI database, and is taking steps in that direction with new features unveiled today in version 11.5.
There are three main ways that Db2 is becoming an AI database, both with 11.5 and future releases, according to a pair of IBMers — Matthias Funke, director of offering management for IBM’s hybrid data cloud business, and Pandit Prasad, offering management for IBM data and AI – who briefed Datanami on the news.
“We think of AI as a means to deliver better quality of service and a better experience with the database engine to the people who are using it,” Funke says. “That’s not just the developer, but also the DBA.”
Built-In Data Science
On the development front, Db2 11.5 will feature better hooks for the languages and development tools that data scientists work with, which will simplify the development and (in the future) the deployment of machine learning models.
IBM is now providing drivers for a variety of languages, including Python, Go, Ruby, PHP, Java, Node.js, and Sequelize — as well as integrating Microsoft Visual Studio and Jupyter notebook development environments – directly within Db2, which will make life easier for data scientists and simplify access data that’s stored in the Db2 database, Prasad says.
“If you don’t have the integration, you have to call ODBC and JDBC,” he says. “You have to manage each and every data field and record, and work your way through it. If you have Python [integration], you tell Python ‘create record’ or ‘get record’ and ‘next record.’ So you just use Python library the way you normally work with Python and start manipulating the data that is inside Db2.”
IBM is also planning to support the deployment of machine learning models created in those libraries directly within the Db2 database, but it’s not yet ready.
“Once you create the model, you create the runtime,” Prasad says. “This will be the step two, which…is not there today, which is run it on Db2. You score the model, and you run the model in Db2 itself, rather than have to run it in some other engine like Spark.”
Automating Mundane Tasks
Secondly, Db2 is getting more functions that will automate routine tasks that today are performed by the DBA.
For example, DBAs today often spend their time creating indexes that accelerate the look-up of information in the database. DBAs also must check to ensure that storage space for logs has not filled up. These are tasks that will be performed autonomously by the database itself, Prasad says.
“We’re picking up speed as people want to automate things,” he says. “You can create more sophisticated operational excellence with Db2. That will make the life of DbAs and administrators much easier and more efficient, performance-wise, as well.”
Lastly, IBM is using machine learning techniques to improve core functions in the database, in particular the query optimizer. This is not a feature that’s available now, but it’s on the roadmap.
“It’s not about moving away but complementing the cost-based optimizer that has been the gold standard in the industry for the last 40 years, to basically help the DBA make sure that the [database] executes with an optimal plan,” Funke says.
In particular, Funke points to the new Cognitive Query function that IBM is developing (it is not included in Db2 11.5). The Cognitive Query function will utilize machine learning models under the covers to deliver results that may be more accurate than using traditional query approaches, the IBMer says.
“Whenever you are looking for affinity between objects, a restriction by predicate might come with the risk that you’re missing out on a certain object that you’re looking for,” Funke says. “This can help you build an application that always gets a complete result set, that has less risk that you are missing out on the object you’re looking for.”
For example, if the police were looking to identify a suspect based on a set of characteristics from eyewitness reports, the Cognitive Query function in a future release of Db2 could effectively use a machine learning clustering approach to help fill in the missing details.
Think of it Cognitive Query as using a fuzzy-math approach instead of strict SQL logic.
“In order to find that suspect, you are using witness information or information from different witnesses that specific certain characteristics of the suspect that were seen on the crime scene. It could be height, weight, or age,” Funke says. “If you are off with one of those properties with your query, our cognitive query still delivers you the right tuple with a high confidence, because the other properties that are in scope, they are still matching up.”
Virtual and Visual
Db2 11.5 also gets the Augmented Data Explorer, a new natural language querying feature that allows users to interact with the database as if it were a traditional search engine. The software also includes a data visualization component that can help users explore datasets.
IBM is also talking up its data virtualization features in Db2. IBM’s database can access data stored in other database, like SQL Server, Oracle, Postgres, Teradata, and MongoDB, in a federated manner. IBM is making the data virtualization functions that are already available with IBM Cloud Private for Data available in Db2. It’s also adding support for blockchain.
All these features – except the Cognitive Query function — are contained in the common SQL engine in Db2 version 11.5 that’s delivered across all three versions of the database, including Db2, Db2 Standard, and Db2 Advanced. The common SQL engine is also used across other members of the Db2 family, including Db2 Big SQL (HDFS support), OLAP and BLU Acceleration, and the in-memory Db2 Event Store.