Published Apr 11, 2023

669: Streaming, reactive, real-time machine learning — with Adrian Kosowski

Explore the future of machine learning with Adrian Kosowski as he delves into energy-efficient processes, the power of lightweight models, and the potential of reactive data processing for real-time applications, while also sharing valuable insights on product leadership and the role of empathy in tech development.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

  • Reactive Framework

    Adrian Kosowski, Co-Founder and Chief Product Officer at Pathway, introduces the concept of reactive data processing, a framework designed to handle changing data efficiently. He explains that reactivity allows for minimal recomputation when data changes, contrasting with traditional methods that require full recomputation. This approach is akin to how spreadsheets update automatically when data changes, offering a scalable solution for real-time data processing.

    Reactivity is all about the art of dealing with changing data in such a way that you don't have to worry too much about the processing part when data changes.

    ---

    Adrian emphasizes that this framework is particularly beneficial for machine learning engineers transitioning from batch prototypes to live streaming applications, reducing the effort required for real-time updates 1 2.

       

    Batch vs. Streaming

    The discussion moves to the differences between batch and streaming data processing, highlighting their distinct characteristics and challenges. Adrian Kosowski notes that batch processing involves scheduled computations on static data, while streaming allows for real-time updates as new data arrives. This real-time capability reduces latency but introduces complexity in maintaining non-trivial logic, such as database joins.

    Batch is the concept that your computation is run and scheduled. I think batch orchestration scheduling. These are concepts that all go together.

    ---

    He explains that while batch systems are familiar to those from static database backgrounds, streaming systems appeal to those in microservices and dynamic data environments 3 4.

       

    Streaming Complexity

    Implementing streaming systems presents significant challenges, yet offers substantial rewards. Adrian highlights that transitioning from batch to streaming can be ten times more complex, requiring extensive system deployment and maintenance efforts. However, the potential value gained from real-time applications can exceed this complexity, making it a worthwhile endeavor.

    Streaming can historically be ten times as complicated, but it can offer more than ten times the value once it's implemented in production and real time.

    ---

    He shares insights from ongoing experiments comparing Pathway with other frameworks, illustrating how live dashboards can evolve with incoming data, smoothing out statistical noise over time 5 6.

Related Episodes