Published Jun 3, 2020

Vicki Boykis — Machine Learning Across Industries

Vicki Boykis delves into the complexities of machine learning consulting, sharing her journey from economic consulting to mastering big data, while emphasizing the necessity of robust data engineering and clear client communication, alongside strategic deployment of tools and models across industries.
Episode Highlights
Gradient Dissent - A Machine Learning Podcast logo

Popular Clips

Episode Highlights

  • Production Hurdles

    Vicki Boykis highlights the intricate challenges of deploying machine learning models into production. She notes that the process is more complex than traditional software deployment due to the need for data management, model drift planning, and service orchestration 1. Vicki explains that prototyping and solidifying steps are crucial, as packaging models often involves creating REST endpoints or using Docker containers 2.

    Putting stuff in production is really hard. And so I would say that's the biggest thing.

    ---

    These challenges underscore the importance of thorough planning and understanding of both data and model intricacies.

       

    Metadata Management

    Managing metadata is a significant yet often overlooked aspect of data science projects. Vicki emphasizes that many companies struggle with metadata management, which is crucial for updating models and conducting analyses 3. She mentions that while open-source tools like Amundsen are emerging, there is still no single solution for comprehensive metadata management.

    People actually clamor for that, more so than even visibility into how to manage the model.

    ---

    Without standardized metadata, companies face issues like not knowing which data is proprietary or how to efficiently query data lakes.

       

    Team Structures

    The structure of data teams significantly impacts their effectiveness. Vicki discusses the benefits of both centralized and embedded data science teams, noting that smaller companies may benefit from a centralized approach, while larger companies might find embedded teams more effective 4. She highlights the risk of siloed teams leading to duplicated efforts, emphasizing the importance of collaboration.

    I've seen it work well different ways in different companies.

    ---

    Ultimately, the choice between centralized and embedded teams depends on the company's size and specific needs.

Related Episodes