Published Sep 3, 2019

Episode 179: Cassandra with Jonathan Ellis

Explore the capabilities of Cassandra with Jonathan Ellis, as he delves into its innovative data modeling, scalable architecture, and efficient read/write operations, showcasing why it's an ideal choice for distributed systems over traditional databases.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Episode Highlights

  • Write Path

    Cassandra's write path is designed to ensure data durability and efficiency. explains that before data is stored in the Memtable, it is appended to a commit log, a strategy similar to traditional databases for ensuring durability 1. This process involves a tunable parameter that determines how frequently the commit log is synchronized to disk, enhancing performance by minimizing disk head movement. inquires about write completion, to which Jonathan clarifies that a write is considered successful before it is written to disk, thanks to the commit log's role in data reliability 2.

    The tunable parameter here is how often do we f sync that commit log? In other words, how often do we tell the operating system to actually send that data in the commit log to disk?

    ---

    This approach allows Cassandra to handle writes efficiently while maintaining data integrity.

       

    Read Optimization

    Cassandra optimizes read operations through a sophisticated system that balances efficiency and accuracy. describes how updates are initially stored in a Memtable and only written to disk once the Memtable is full, ensuring that the most recent data is available for read requests 3. This system allows Cassandra to merge new values from the Memtable with existing data on disk, providing up-to-date results without immediate disk writes. Additionally, the replication mechanism involves a failure detector that uses a probabilistic algorithm to ensure data consistency across nodes 4.

    A read gets a little bit more complicated. So at the high level, it's like a write only the other direction.

    ---

    This method minimizes network traffic by using digests to verify data consistency, requesting full data only when necessary.

Related Episodes