Follow Datanami:
March 26, 2024

Kinetica Elevates RAG with Fast Access to Real-Time Data

(Summit Art Creations/Shutterstock)

Kinetica got its start building a GPU-powered database to serve fast SQL queries and visualizations for US government and military clients. But with a pair of announcements at Nvidia’s GTC show last week, the company is showing it’s prepared for the coming wave of generative AI applications, particularly those utilizing retrieval augmented generation (RAG) techniques to tap unique data sources.

Companies today are hunting for ways to leverage the power of large language models (LLMs) with their own proprietary data. Some companies are sending their data to OpenAI’s cloud or other cloud-based AI providers, while others are building their own LLMs.

However, many more companies are adopting the RAG approach, which has surfaced as perhaps the best middle ground between that doesn’t require building your own model (time-consuming and expensive) or sending your data to the cloud (not good privacy and security-wise).

With RAG, relevant data is injected directly into the context window before being sent off to the LLM for execution, thereby providing more personalization and context in the LLMs response. Along with prompt engineering, RAG has emerged as a low-risk and fruitful method for juicing GenAI returns.

The VRAM boost in Nvidia’s Blackwell GPU will help Kinetica keep the processor fed with data, Negahban said 

Kinetica is also now getting into the RAG game with its database by essentially turning it into a vector database that can store and serve vector embeddings to LLMs, as well as by performing vector similarity search to optimize the data it sends to the LLM.

According to its announcement last week, Kinetica is able to serve vector embeddings 5x faster than other databases, a number it claims came from the VectorDBBench benchmark. The company claims its able to achieve that speed by leveraging Nvidia’s RAPIDS RAFT technology.

That GPU-based speed advantage will help Kinetica customers by enabling them to scan more of their data,  including real-time data that has just been added to the database, without doing a lot of extra work, said Nima Negahban, co0founder and CEO of Kinetica.

“It’s hard for an LLM or a traditional RAG stack to be able to answer a question about something that’s happening right now, unless they’ve done a lot of pre-planning for specific data types,” Negahban told Datanami at the GTC conference last week, “whereas with Kinetica, we’ll be able to help you by looking at all the relational data, generate the SQL on the fly, and ultimately what we put just back in the context for the LLM is a simple text payload that the LLM will be able to understand to use to give the answer to the question.”

This essentially gives users the capability to talk to their complete corpus of relational enterprise data, without doing any preplanning.

“That’s the big advantage,” he continued, “because the traditional RAG pipelines right now, that part of it still requires a good amount of work as far as you have to have the right embedding model, you have to test it, you have to make sure it’s working for your use case.”

Kinetica can also talk to other databases and function as a generative federated query engine, as well as do the traditional vectorization of data that customers put inside of Kinetica, Negahban said. The database is designed to be used for operational data, such as time-series, telemetry, or teleco data. Thanks to the support for NVIDIA NeMo Retriever microservices, the company is able to position that data in a RAG workflow.

But for Kinetica, it all comes back to the GPU. Without the extreme computational power of the GPU, the company has just another RAG offering.

“Basically you need that GPU-accelerated engine to make it all work at the end of the day, because it’s got to have the speed,” said Negahban, a 2018 Datanami Person to Watch. “And we then put all that orchestration on top of it as far as being able to have the metadata necessary, being able to connect to other databases, having all that to make it easy for the end user, so basically they can start taking advantage of all that relational enterprise data in their LLM interaction.”

Related Items:

Bank Replaces Hundreds of Spark Streaming Nodes with Kinetica

Kinetica Aims to Broaden Appeal of GPU Computing

Preventing the Next 9/11 Goal of NORAD’s New Streaming Data Warehouse