SE Radio 605: Yingjun Wu on Streaming Databases

Topics covered
Popular Clips
Episode Highlights
Computation
Yingjun Wu, founder of RisingWave Labs, highlights the fundamental differences between streaming and traditional databases. In streaming databases, computation precedes storage, allowing for real-time data processing, whereas traditional databases store data first before computation 1. This approach enables streaming databases to achieve low latency through incremental computation, focusing only on new data rather than reprocessing entire datasets 2.
In the stream processing world, there's a concept called backlog, or probably back pressure.
---
This method ensures that streaming databases can handle data influx efficiently, maintaining up-to-date results without lag.
Scaling
Dynamic scaling is crucial for streaming databases to efficiently manage fluctuating workloads. Yingjun explains that streaming databases like RisingWave are designed to dynamically scale resources based on demand, optimizing cost efficiency by provisioning only the necessary compute nodes 3. This flexibility is achieved through strategies like persisting data in remote storage, which allows seamless horizontal scaling without downtime or data migration 4.
In the stream processing world, we really want to achieve, let's say, so called a dynamic scaling.
---
Such strategies ensure that streaming databases can adapt to varying traffic patterns while minimizing operational costs.
Schema Changes
Handling schema changes is a critical aspect of streaming databases, as data structures in upstream services can frequently change. Yingjun notes that RisingWave handles schema changes similarly to PostgreSQL, automatically adapting to changes in upstream databases like MySQL or PostgreSQL 5. This capability ensures continuous data processing without manual intervention, allowing streaming databases to remain flexible and responsive to evolving data environments.
RisingWave does the schema change in the same way as a postgres.
---
Such adaptability is essential for maintaining data integrity and consistency in dynamic data landscapes.
Related Episodes

SE Radio 560: Sugu Sougoumarane on Distributed SQL Databases
Answers 383 questions

SE-Radio Episode 346: Stephan Ewen on Streaming Architecture
Answers 383 questions

SE Radio 623: Mike Freedman on TimescaleDB
Answers 383 questions

SE-Radio Episode 243: RethinkDB with Slava Akhmechet
Answers 383 questions

SE Radio 592: Jaxon Repp on Distributed Data Infrastructure
Answers 383 questions

SE Radio 561: Dan DeMers on Dataware
Answers 383 questions

SE Radio 601: Han Yuan on Reorganizations
Answers 383 questions

SE Radio 583: Lukas Fittl on Postgres Performance
Answers 383 questions

SE-Radio Episode 353: Max Neunhoffer on Multi-model databases and ArangoDB
Answers 383 questions

SE Radio 619: James Strong on Kubernetes Networking
Answers 383 questions

364: Peter Zaitsev on Choosing the Right Open Source Database
Answers 383 questions
Episode 417: Alex Petrov on Database Storage Engines
Answers 383 questions

SE-Radio Episode 344: Pat Helland on Web Scale
Answers 383 questions

Episode 194: Michael Hunger on Graph Databases
Answers 383 questions

Episode 504: Frank McSherry on Materialize
Answers 383 questions













