Episode 179: Cassandra with Jonathan Ellis

Topics covered
Popular Clips
Episode Highlights
Write Path
Cassandra's write path is designed to ensure data durability and efficiency. explains that before data is stored in the Memtable, it is appended to a commit log, a strategy similar to traditional databases for ensuring durability 1. This process involves a tunable parameter that determines how frequently the commit log is synchronized to disk, enhancing performance by minimizing disk head movement. inquires about write completion, to which Jonathan clarifies that a write is considered successful before it is written to disk, thanks to the commit log's role in data reliability 2.
The tunable parameter here is how often do we f sync that commit log? In other words, how often do we tell the operating system to actually send that data in the commit log to disk?
---
This approach allows Cassandra to handle writes efficiently while maintaining data integrity.
Read Optimization
Cassandra optimizes read operations through a sophisticated system that balances efficiency and accuracy. describes how updates are initially stored in a Memtable and only written to disk once the Memtable is full, ensuring that the most recent data is available for read requests 3. This system allows Cassandra to merge new values from the Memtable with existing data on disk, providing up-to-date results without immediate disk writes. Additionally, the replication mechanism involves a failure detector that uses a probabilistic algorithm to ensure data consistency across nodes 4.
A read gets a little bit more complicated. So at the high level, it's like a write only the other direction.
---
This method minimizes network traffic by using digests to verify data consistency, requesting full data only when necessary.
Related Episodes


Episode 413: Spencer Kimball on CockroachDB
Answers 383 questions

SE-Radio Episode 243: RethinkDB with Slava Akhmechet
Answers 383 questions

Episode 194: Michael Hunger on Graph Databases
Answers 383 questions

Episode 209: Josiah Carlson on Redis
Answers 383 questions

Episode 393: Jay Kreps on Enterprise Integration Architecture with a Kafka Event Log
Answers 383 questions

Episode 189: Eric Lubow on Polyglot Persistence
Answers 383 questions
Episode 417: Alex Petrov on Database Storage Engines
Answers 383 questions

Episode 44: Interview Brian Goetz and David Holmes
Answers 383 questions
SE Radio 560: Sugu Sougoumarane on Distributed SQL Databases
Answers 383 questionsEpisode 29: Concurrency Pt.3
Answers 383 questions

Episode 34: Enterprise Architecture
Answers 383 questions

Episode 171: Scala Update with Martin Odersky
Answers 383 questions

Episode 133: Continuous Integration with Chris Read
Answers 383 questions

364: Peter Zaitsev on Choosing the Right Open Source Database
Answers 383 questions

Episode 22: Feedback
Answers 383 questions














