Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Topics covered
Popular Clips
Episode Highlights
Tuple Processing
Apache Storm's tuple processing is a sophisticated mechanism that ensures data is efficiently processed across a cluster. explains that Storm uses a spout to emit tuples, which are then processed by bolts, creating a tree of computation across the cluster 1. This process guarantees that every message is successfully processed by tracking the tuple tree with minimal memory usage, only about 20 bytes, even for a billion pending messages 2.
Regardless of how big the tuple tree gets, tuple tree could have a billion pending messages, and it still only needs 20 bytes to track the tuple tree.
---
This efficient tracking system is based on a probabilistic algorithm that minimizes the chance of errors, making Storm highly reliable for real-time processing 2.
Fault Tolerance
Storm's architecture is designed for robust fault tolerance, ensuring continuous operation even when processes fail. highlights that Storm's process fault tolerance allows for restarting processes without disrupting the running application, a crucial feature for maintaining uptime 3. The architecture includes components like Nimbus, Zookeeper, and supervisor daemons, which coordinate to keep the system running smoothly even during failures 4.
You can kill Dash nine, Nimbus or the supervisors and nothing will happen to running topologies.
---
This design ensures that even if a node fails, Storm can reassign tasks to other nodes, maintaining the integrity and progress of the data processing 5.
Related Episodes


Episode 436: Apache Samza with Yi Pan
Answers 383 questions

SE-Radio Episode 346: Stephan Ewen on Streaming Architecture
Answers 383 questions

Episode 381: Josh Long on Spring Boot
Answers 383 questions

SE-Radio Episode 272: Frances Perry on Apache Beam
Answers 383 questions

SE-Radio-Episode-235:-Ben-Hindman-on-Apache-Mesos
Answers 383 questions

Episode 223: Joram Barrez on the Activiti Business Process Management Platform
Answers 383 questions

SE-Radio Episode 320: Nate Taggart on Serverless Paradigm
Answers 383 questions

Episode 210: Stefan Tilkov on Architecture and Micro Services
Answers 383 questions

Episode 193: Apache Mahout
Answers 383 questions

Episode 216: Adrian Cockcroft on the Modern Cloud-based Platform
Answers 383 questions

Episode 495: Vaughn Vernon on Strategic Monoliths and Microservices
Answers 383 questions

Episode 433: Jay Kreps on ksqlDB
Answers 383 questions

Episode 213: James Lewis on Microservices
Answers 383 questions

Episode 17: Feedback and Roadmap
Answers 383 questions

Episode 157: Hadoop with Philip Zeyliger
Answers 383 questions














