Published Aug 24, 2021

SDS 499: Data Meshes and Data Reliability — with Barr Moses

Join Jon Krohn as he engages with Barr Moses to uncover the essentials of data reliability, the innovative concept of data meshes fostering decentralized data management, and the art of building adaptive data science teams in dynamic startup environments.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

  • Data Reliability

    explains data reliability as akin to software service uptime, emphasizing its critical role in modern data usage. She highlights the necessity of maintaining data pipelines to ensure continuous access and decision-making capabilities. underscores the importance of this concept, noting that "the stakes are higher now" as data becomes increasingly mission-critical 1. Moses's effective communication skills are praised, reflecting her ability to convey complex technical content clearly 2.

       

    Observability Pillars

    To ensure data reliability, Moses introduces the five pillars of data observability: freshness, volume, schema, distribution, and lineage. These pillars provide a comprehensive view of data health, enabling organizations to maintain high data quality. She explains, "if you can automatically collect information, monitor those five pillars, you can actually have a holistic, unified view of the health of your data" 3. This approach draws from software engineering practices, leveraging tools like Snowflake and Databricks to enhance data processing and accuracy 4.

       

    Accuracy Challenges

    Moses discusses the challenges of maintaining data accuracy in complex organizational environments. She notes that manual verification is no longer feasible due to the vast number of data sources. "The question is, as an analogy, sort of call this kind of like data downtime," she explains, highlighting the need for rapid detection of data issues 5. Her background in the Israeli Air Force instilled a strong sense of responsibility for data accuracy, emphasizing the importance of minimizing defects in data-driven decisions 6.

Related Episodes