Published Nov 24, 2020

Episode 436: Apache Samza with Yi Pan

Yi Pan, lead maintainer of Apache Samza, discusses the framework's robust API security, modular architecture, and seamless integration capabilities with tools like RocksDB and Kafka, enhancing developer productivity and system performance. Delve into the advanced auto-scaling features and the comparative strengths of Samza in the stream processing ecosystem.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Episode Highlights

  • Samza Features

    Apache Samza stands out in the stream processing landscape with its unique features and technical capabilities. highlights its integration with RocksDB, which provides sub-millisecond read-write latency and strong consistency, making it ideal for high-performance applications like sensor processing 1. The platform also supports SQL and Beam APIs, enabling users to write stream processing pipelines without deep system knowledge 1. This flexibility is further enhanced by Samza's modular architecture, which allows it to run independently or alongside systems like Kubernetes 2.

    The biggest challenge of course in stream processing is maintaining a large number of stream processing pipelines with strict SLA's 24/7.

    ---

    These features make Samza a versatile choice for organizations looking to implement robust stream processing solutions.

       

    Framework Comparisons

    Comparing Apache Samza with other frameworks like Flink, Storm, and Spark reveals its distinct advantages. notes that while Storm struggles with large topology management and lacks efficient state store support, Samza excels with its native integration of RocksDB 3. Unlike Spark, which traditionally uses micro-batch processing, Samza offers pure streaming capabilities, allowing for event-by-event processing without batch synchronization 4. This makes Samza particularly suitable for real-time analytics and applications requiring low latency.

    Storm doesn't really come with a good large state store support with short latencies.

    ---

    These comparisons underscore Samza's strengths in handling complex streaming tasks efficiently.

       

    Kafka Integration

    Apache Samza's integration with Kafka is a key component of its architecture, offering flexibility and scalability. While Samza was developed alongside Kafka at LinkedIn, it remains a pluggable service that can integrate with various systems like Event Hub and Kinesis 5. explains that for smaller use cases, Kafka Streams might be sufficient, but larger applications benefit from the full integration of Kafka with Samza 6. This setup allows organizations to leverage Kafka's robust streaming capabilities while utilizing Samza's processing power.

    Although sensor is a co developer, almost a co developer by Kafka and leverages a lot of its architecture choices.

    ---

    This integration strategy ensures that Samza can adapt to various streaming needs, making it a versatile tool in the Apache ecosystem.

Related Episodes