paint-brush
Apache Cassandra 5.0 Is Coming: Here’s Why the People Who Built It Are Fired Upby@datastax
1,257 reads
1,257 reads

Apache Cassandra 5.0 Is Coming: Here’s Why the People Who Built It Are Fired Up

by DataStaxJuly 31st, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

From dynamic data masking to ACID transactions, Apache Cassandra committers discuss some of the big and exciting changes in the upcoming 5.0 release.
featured image - Apache Cassandra 5.0 Is Coming: Here’s Why the People Who Built It Are Fired Up
DataStax HackerNoon profile picture
0-item
1-item


Ask eight different people their opinion and you’ll get eight different answers. The Apache Cassandra open-source project is built and maintained by a collection of individuals that all arrive with their own motivations. Some love new features. Some love squeezing all the performance they can out of the system. Some want to make operator'’ lives easier. What ties them all together? They’re working as a distributed team toward a single goal: an amazing database that just keeps getting better.


Cassandra is a collaborative effort of engineers from different parts of the world who share a common goal of creating the best product possible. They tackle problems for their employers while contributing to the open-source code for the project. Those who earn the trust of the community and can make changes to the base code are called “committers.” Becoming a committer requires dedication and passion for the project. Recently, the project held an event called Cassandra Forward, where some of the committers shared their insights on the upcoming release of Cassandra 5.0. Here’s what they had to say.


John Haddad: Java 17 support and garbage collectors

Haddad tells us he’s looking forward to support for Java 17 and its low-latency garbage collectors like ZGC in Cassandra 5.0. The former Netflix and Apple developer, who’s been a Cassandra committer since 2017, says these collectors will provide sub-millisecond pause times and a "set and forget" model, making memory management less overwhelming for Cassandra users. As the project matures and memory management gets even better, there will be improvements in the duration and frequency of GC pauses, making it easier to run denser nodes, which will save money for users.


“That means we'll see less-frequent GC pauses — and when they happen, they'll take less time. This will make it easier to run denser nodes, meaning your cluster will be less expensive to run. I love the idea of saving money just by doing an upgrade.”


Andrés de la Peña: Dynamic data masking

De la Peña, a DataStax software engineer and a Cassandra committer since 2016, is enthusiastic about the dynamic data masking feature in Cassandra 5.0, which enables sensitive information to be obscured while still allowing access to the masked columns. This feature replaces the real values of columns with generic data using a series of regular SQL functions that transform the cell values. Administrators can attach these masking functions to the columns of the table schema, so unprivileged users will always see masked data, even if they don't specify the functions in the query. The set of available masking functions is relatively small at the moment, but users can use their own user-defined functions for masking, making it easy to add custom types of masking.


“It is a security anonymization feature that is available in many databases out there

and is long overdue in Cassandra.”


Vinay Chella: Guardrails

Chella, a senior engineering leader at Netflix and a committer since 2019, is excited about the new features in Cassandra 5.0 that provide more guardrails for developers, improve stability and enhance the operating experience. The introduction of guardrails in Cassandra 4.1 allowed for soft and hard limits on user actions, and Cassandra 5.0 adds several new guardrails to increase reliability, availability and user experience. These guardrails codify best practices and avoid catastrophic mistakes, such as dropping production-critical key spaces or losing data.


“These guardrails certainly help prevent lots of these ‘oops’ moments.”


Mick Semb Wever: Community

Semb Weaver, a Cassandra committer since 2016 and a principal architect at DataStax, appreciates how Cassandra 5.0 embodies “real open source,” by having multiple vendors, companies and employees behind its contributors. This creates a diverse development community with a rich set of features and applications, and emphasizes the importance of engineering hygiene, building QA and CI to improve trust and enable radical features. He says these principles and practices will lead to greater longevity, sustainability and modernization of the technology, and that it encourages diversity and collaboration in the community.


“It’s what's enabling some of the radical features that are coming in 5.0 — stuff like Accord — that we can't get over the finish line if we're not all working together as a team.”


Jordan West: More sleep!

West, a senior Netflix software engineer and a Cassandra Committer since 2020, is excited about how improvements in Cassandra 5.0 will lead to better reliability and performance, which will result in more sleep for him as an on-call engineer. He highlights the new transactional metadata feature and the improved memtables that will allow more writes faster. He also describes how the new virtual tables, diagnostics and metrics will provide more insight into Cassandra and help resolve incidents faster.


“I know with Cassandra 5.0 [that] when I go to bed, I'm less likely to get woken up — and when I do I'm going to solve our problems faster and get back to bed faster.”


Ekaterina Dimitrova: Accord and ACID transactions

A DataStax engineer who has been a committer since 2020, Dimitrova is eagerly anticipating the community's implementation of the Accord protocol. This protocol will enable global consensus and allow ACID transactions to be carried out at scale, making developers more efficient without compromising on performance or scalability. Global consensus is crucial in things like bank transfers; concurrency guarantees ensure that only one process can make changes at a time. The new syntax we are creating for developers will include begin and commit transaction declarations, which allow all operations within the declaration to be fully ACID compliant.


Lorina Poland: Unified Compaction Strategy

Poland, a DataStax tech lead who became a committer in 2021, likes the benefits of Cassandra 5.0's Unified Compaction Strategy (UCS), which combines old legacy compaction strategies like CT, size-tiered and level compaction strategies. UCS is a significantly faster compaction strategy that has reduced space overhead and allows for parallelism. The strategy also has a scaling factor that can be tuned to specific workloads, whether they are read-heavy or write-heavy, or both. There’s no need to know how the legacy strategies work, and there is zero overhead for migrating to UCS.


“If you need it to be write-heavy,  you can tune it to that; if you need it to be read-heavy, you can tune to that; and if you just want something in-between, it works well for whatever your workload is.”


Benjamin Lerer: Storage-Attached Indexing

Lerer became a committer eight years ago. The DataStax tech lead notes that storage-attached secondary index (SASI) was added in 2016, but was not invested in enough and had to be marketed experimentally in Cassandra 4.0 as it didn’t meet desired standards. SAI has been built atop SASI and has its own set of innovations, including the ability to index multiple columns without scalability issues and optimization for space usage and numeric crunch queries.


“SAI will enable a new set of query capabilities, without the drawbacks that secondary indexing or SASI had.”


Branimir Lambov: Pluggability

Lambov, a DataStax engineer who’s been a Cassandra committer since 2015, is excited about the local storage pluggability in Cassandra 5.0. The change centers around the memtable, which is a temporary storage area in the computer's memory where data is stored before being written to more permanent storage. The goal of the new implementation is to make it easier to use different types of memtables and select the best one for each specific use case. One of the new implementations is based on a Trie data structure, which provides a much more efficient way of storing data. It also enables memory to be used off of the main Java heap, resulting in no garbage collection for storage operations. These improvements can double the write throughput of the database. It will be exciting to see where the community takes this flexible storage interface next.


In OSS, people make all the difference

Exploring a successful open-source project is a captivating journey, from both the human and technological viewpoints. While technology may be the initial focus of a software project, it's the people involved that make it truly fascinating. Each person brings their unique emotions and desires to the table, which can result in positive or negative outcomes. In an open-source project, individuals' desires to improve something are laid bare and open to criticism. However, it's through the determination to work together and move forward that the true magic of the project occurs.


What features are you looking forward to in Cassandra 5.0? Personally, I'm excited for the developer improvements that will be game-changing, such as ACID transactions, new indexing schemes and new syntax like the NOT operator. As a Cassandra committer myself, I enjoy watching developers use these new features and create amazing things. If you haven't checked out Cassandra in a while, now is a good time to do so. Join the rest of the user community over at Planet Cassandra and share your thoughts on what excites you about Cassandra 5.0.


Also published here.