Follow BigDATAwire:
September 9, 2024

Apache Cassandra 5.0 Brings Major Updates with Enhanced Indexing and AI Capabilities

The Apache Cassandra Community has announced the general availability of Apache Cassandra 5.0, offering better data efficiency, integration of GenAI functionality, and improved performance. 

Apache Cassandra is a distributed, open-source NoSQL database built to manage large volumes of data across multiple servers without a single point of failure. Known for its high availability and fault tolerance, the database enables organizations to have multiple nodes in different locations while keeping them synchronized.

With the new Cassandra 5.0 the database gets a major boost with a new indexing approach through the Storage Attached Indexes (SAI) feature. Previously, companies had to specify how the data model was built. With the new release, developers are no longer bound by strict data models. The update allows for more efficient queries on non-primary key columns and simplifies the use of secondary indexes with reduced overhead.

The Apache Cassandra community is also expanding the database’s capabilities to include Vector Search and a new vector data type, which are crucial for AI and machine learning (ML) projects. These features facilitate effective similarity comparisons by storing and retrieving embeddings vectors and improving functionality for applications such as recommendation engines, fraud detection, image recognition, and AI chatbots. 

The update also features a unified compaction strategy that increases data density per node. Instead of the previous limit of four terabytes per node, Cassandra 5.0 offers 10 or more terabytes per node. This increase enables enterprise users to reduce the number of nodes needed for large-scale deployments and also helps lower operational costs. 

Additionally, Cassandra 5.0 introduces a pair of new data structures known as trie memtables and trie SSTables, which align data structures from user input to disk storage. This enhancement reduces unnecessary processing and conversion time, making data retrieval from memory or disk faster and more efficient. 

“Typically, Cassandra is used for storing structured and semi-structured data, making it ideal for applications like time series data, IoT, and social media platforms. However, Artificial Intelligence (AI) transforms how we interact with data,” according to Cassandra in a recent blog post. 

“While Cassandra has become a go-to choice for many AI applications, such as Netflix and Uber, the introduction of generative AI and large language models (LLMs) has sparked a need for new query capabilities.”

Cassandra claims that the new Java Development Kit (JDK) 17 support brings performance improvements of up to 20% as a result of the enhanced memory management capabilities. 

The highly anticipated release of Apache Cassandra 5.0 marks the first major upgrade since version 4.0 was launched in 2021. The 4.0 version introduced faster scaling with “zero-copy streaming,” improved audit logging, finer data access controls, and selective system metric exposure. In 2022, Apache Cassandra 4.1 received a minor update that introduced new scalability features

(Joe Techapanupreeda/Shutterstock)

Since the last update, the Apache Cassandra community has focused on version 5.0, introducing enhancements and new features to improve its functionality and performance.

The release heralds a new phase of scalability and performance. The new version not only delivers substantial performance improvements but also makes significant advances in AI and data efficiency.

Users can upgrade from version 4 to 5.0 through an online upgrade, minimizing downtime for applications. With the release of Cassandra 5.0, the company announced the end of life for the 3.x series, urging users to plan their upgrade strategy to ensure continued support and access to security updates and bug fixes. 

With Apache Cassandra 5.0 now generally available, the focus is shifting to future developments, including Cassandra 5.1, which has been in progress since November 2023. The upcoming release is reportedly implementing full ACID (Atomicity, Consistency, Isolation, Durability) transactions to expand the applicability of the database to new use cases.

Related Items 

ScyllaDB Raises $43M to Take on MongoDB at Scale, Push Database Performance to New Levels

NoSQL Databases Gain Usability, Speed

DataStax Announces Vector Search for DataStax Enterprise

BigDATAwire