Follow Datanami:
September 30, 2019

Kafka Spawns Open-Source KarelDB

Apache Kafka and its accompanying key-value store are being used to provide persistent storage for a growing list of relational databases. Most used a key-value store as a foundation.

Among the latest to emerge is KarelDB, a relational database built almost entirely on open source components, including Apache Calcite for the SQL engine along with Apache Omid for transactions and control features. The open-source database so far only supports a single node, but database watchers consider it sufficiently promising to track for future scaling.

KaralDB and other emerging databases are built on a Kafka embedded key-value store and in-memory cache known as KCache. The new relational database by default uses KCache configured as a RocksDB cache supported by ubiquitous Kafka stream processing software.

“This allows KarelDB to support larger datasets and faster startup times,” noted Robert Yokota of Kafka-based streaming platform vendor Confluent. “KCache can also be configured to use an in-memory cache instead of RocksDB if desired,” Yokota added in recent blog post introducing KarelDB.

Unlike Confluent’s Kafka-based platform, KarelDB is not a streaming database. Yokota nevertheless flagged the relational database largely because it’s based on open-source components backed by Kafka. Hence, he reckons there’s a chance it could take off.

Those open source components include Calcite, an SQL framework that pushes relational queries to the data store, an approach seen as providing more efficient processing. Yokota noted that KarelDB would “automatically benefit” from upcoming Calcite optimizations.

Other open source projects such as the Apache Flink stream processing engine also have leveraged Calcite, including an SQL API. Calcite also includes an SQL parser.

Meanwhile, the Apache Omid framework is being used with KarelDB to support transactions on a key-value store. Omid, originally designed as a transaction manager for the HBase NoSQL database, has been found to readily mesh with KCache since it uses existing key-value store to maintain transaction metadata.

Yokota noted that KarelDB stacks those and other features on top of KCache to manage transactions. Omid also uses a technique called multi-version concurrency control to implement “snapshot isolation” in other relational databases.

KarelDB also is touted for its ability to run either as an embedded database or as a server. In the latter case, it uses Apache Avatica to support the Remote Procedure Call wire protocol.

Among the advantages of running these and other open-source components with Kafka is the ability of multiple servers to “tail” the same set of topics, Yokota noted. “This allows multiple KarelDB servers to run as a cluster, with no single-point of failure,” he added.

KarelDB is named after Karel Capek, a Czech science fiction author who is credited with inventing the word “robot”. A programming language is also named for him.

Recent items:

Kafka in the Cloud: Who Needs Clusters Anyway?

Rockset Connects Kafka with SQL