Published Feb 28, 2024

SE Radio 605: Yingjun Wu on Streaming Databases

Yingjun Wu of RisingWave Labs delves into the transformative power of streaming databases, exploring architectural differences, dynamic scaling, and schema adaptability for real-time insights. He discusses the balance between cost efficiency and performance, addressing data processing challenges like out-of-order events with innovative tools such as watermarks and parallel data consumption.

Episode Highlights

Topics covered

Episode Highlights

Computation

Yingjun Wu, founder of RisingWave Labs, highlights the fundamental differences between streaming and traditional databases. In streaming databases, computation precedes storage, allowing for real-time data processing, whereas traditional databases store data first before computation 1. This approach enables streaming databases to achieve low latency through incremental computation, focusing only on new data rather than reprocessing entire datasets 2.

In the stream processing world, there's a concept called backlog, or probably back pressure.

---

This method ensures that streaming databases can handle data influx efficiently, maintaining up-to-date results without lag.

Scaling

Dynamic scaling is crucial for streaming databases to efficiently manage fluctuating workloads. Yingjun explains that streaming databases like RisingWave are designed to dynamically scale resources based on demand, optimizing cost efficiency by provisioning only the necessary compute nodes 3. This flexibility is achieved through strategies like persisting data in remote storage, which allows seamless horizontal scaling without downtime or data migration 4.

In the stream processing world, we really want to achieve, let's say, so called a dynamic scaling.

---

Such strategies ensure that streaming databases can adapt to varying traffic patterns while minimizing operational costs.

Schema Changes

Handling schema changes is a critical aspect of streaming databases, as data structures in upstream services can frequently change. Yingjun notes that RisingWave handles schema changes similarly to PostgreSQL, automatically adapting to changes in upstream databases like MySQL or PostgreSQL 5. This capability ensures continuous data processing without manual intervention, allowing streaming databases to remain flexible and responsive to evolving data environments.

RisingWave does the schema change in the same way as a postgres.

---

Such adaptability is essential for maintaining data integrity and consistency in dynamic data landscapes.

Related Episodes

SE Radio 560: Sugu Sougoumarane on Distributed SQL Databases
Answers 383 questions
SE-Radio Episode 346: Stephan Ewen on Streaming Architecture
Answers 383 questions
SE Radio 623: Mike Freedman on TimescaleDB
Answers 383 questions
SE-Radio Episode 243: RethinkDB with Slava Akhmechet
Answers 383 questions
SE Radio 592: Jaxon Repp on Distributed Data Infrastructure
Answers 383 questions
SE Radio 561: Dan DeMers on Dataware
Answers 383 questions
SE Radio 601: Han Yuan on Reorganizations
Answers 383 questions
SE Radio 583: Lukas Fittl on Postgres Performance
Answers 383 questions
SE-Radio Episode 353: Max Neunhoffer on Multi-model databases and ArangoDB
Answers 383 questions
SE Radio 619: James Strong on Kubernetes Networking
Answers 383 questions
364: Peter Zaitsev on Choosing the Right Open Source Database
Answers 383 questions
Episode 417: Alex Petrov on Database Storage Engines
Answers 383 questions
SE-Radio Episode 344: Pat Helland on Web Scale
Answers 383 questions
Episode 194: Michael Hunger on Graph Databases
Answers 383 questions
Episode 504: Frank McSherry on Materialize
Answers 383 questions

SE Radio 605: Yingjun Wu on Streaming Databases

Topics covered

Popular Clips

Episode Highlights

Streaming Database Architecture

Computation

Scaling

Schema Changes

Cost Efficiency and Trade-offs

Data Processing ChallengesYingjun Wu discusses the complexities of handling out-of-order events and data ingestion in streaming databases. He highlights the use of watermarks and parallel data consumption to ensure accurate and efficient data processing.

Data Processing Challenges

Related Episodes