Published Feb 28, 2024

SE Radio 605: Yingjun Wu on Streaming Databases

Yingjun Wu of RisingWave Labs delves into the transformative power of streaming databases, exploring architectural differences, dynamic scaling, and schema adaptability for real-time insights. He discusses the balance between cost efficiency and performance, addressing data processing challenges like out-of-order events with innovative tools such as watermarks and parallel data consumption.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Episode Highlights

  • Computation

    Yingjun Wu, founder of RisingWave Labs, highlights the fundamental differences between streaming and traditional databases. In streaming databases, computation precedes storage, allowing for real-time data processing, whereas traditional databases store data first before computation 1. This approach enables streaming databases to achieve low latency through incremental computation, focusing only on new data rather than reprocessing entire datasets 2.

    In the stream processing world, there's a concept called backlog, or probably back pressure.

    ---

    This method ensures that streaming databases can handle data influx efficiently, maintaining up-to-date results without lag.

       

    Scaling

    Dynamic scaling is crucial for streaming databases to efficiently manage fluctuating workloads. Yingjun explains that streaming databases like RisingWave are designed to dynamically scale resources based on demand, optimizing cost efficiency by provisioning only the necessary compute nodes 3. This flexibility is achieved through strategies like persisting data in remote storage, which allows seamless horizontal scaling without downtime or data migration 4.

    In the stream processing world, we really want to achieve, let's say, so called a dynamic scaling.

    ---

    Such strategies ensure that streaming databases can adapt to varying traffic patterns while minimizing operational costs.

       

    Schema Changes

    Handling schema changes is a critical aspect of streaming databases, as data structures in upstream services can frequently change. Yingjun notes that RisingWave handles schema changes similarly to PostgreSQL, automatically adapting to changes in upstream databases like MySQL or PostgreSQL 5. This capability ensures continuous data processing without manual intervention, allowing streaming databases to remain flexible and responsive to evolving data environments.

    RisingWave does the schema change in the same way as a postgres.

    ---

    Such adaptability is essential for maintaining data integrity and consistency in dynamic data landscapes.

Related Episodes