669: Streaming, reactive, real-time machine learning — with Adrian Kosowski

Topics covered
Popular Clips
Episode Highlights
Reactive Framework
Adrian Kosowski, Co-Founder and Chief Product Officer at Pathway, introduces the concept of reactive data processing, a framework designed to handle changing data efficiently. He explains that reactivity allows for minimal recomputation when data changes, contrasting with traditional methods that require full recomputation. This approach is akin to how spreadsheets update automatically when data changes, offering a scalable solution for real-time data processing.
Reactivity is all about the art of dealing with changing data in such a way that you don't have to worry too much about the processing part when data changes.
---
Adrian emphasizes that this framework is particularly beneficial for machine learning engineers transitioning from batch prototypes to live streaming applications, reducing the effort required for real-time updates 1 2.
Batch vs. Streaming
The discussion moves to the differences between batch and streaming data processing, highlighting their distinct characteristics and challenges. Adrian Kosowski notes that batch processing involves scheduled computations on static data, while streaming allows for real-time updates as new data arrives. This real-time capability reduces latency but introduces complexity in maintaining non-trivial logic, such as database joins.
Batch is the concept that your computation is run and scheduled. I think batch orchestration scheduling. These are concepts that all go together.
---
He explains that while batch systems are familiar to those from static database backgrounds, streaming systems appeal to those in microservices and dynamic data environments 3 4.
Streaming Complexity
Implementing streaming systems presents significant challenges, yet offers substantial rewards. Adrian highlights that transitioning from batch to streaming can be ten times more complex, requiring extensive system deployment and maintenance efforts. However, the potential value gained from real-time applications can exceed this complexity, making it a worthwhile endeavor.
Streaming can historically be ten times as complicated, but it can offer more than ten times the value once it's implemented in production and real time.
---
He shares insights from ongoing experiments comparing Pathway with other frameworks, illustrating how live dashboards can evolve with incoming data, smoothing out statistical noise over time 5 6.
Related Episodes


671: Cloud Machine Learning — with Kirill Eremenko and Hadelin de Ponteves
Answers 383 questions

661: Designing Machine Learning Systems — with Chip Huyen
Answers 383 questions

649: Introduction to Machine Learning — with Kirill Eremenko and Hadelin de Ponteves
Answers 383 questions

SDS 435: Scaling Up Machine Learning — with Erica Greene
Answers 383 questions

632: Liquid Neural Networks — with Adrian Kosowski
Answers 383 questions

826: In Case You Missed It in September 2024 — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions

SDS 571: Collaborative, No-Code Machine Learning — with Tim Kraska
Answers 383 questions

657: How to Learn Data Engineering — with Andreas Kretz (@andreaskayy)
Answers 383 questions

SDS 599: MLOps: Machine Learning Operations — with @Miki_ML
Answers 383 questions

699: The Modern Data Stack — with Harry Glaser
Answers 383 questions

645: Machine Learning for Video Games — with Carly Taylor
Answers 383 questions

819: PyTorch: From Zero to Hero — with Luka Anicin
Answers 383 questions

SDS 605: Upskilling in Data Science and Machine Learning — with Kian Katanforoosh
Answers 383 questions














