Published Jun 22, 2022

Machine learning in your database

Explore Instacart's innovative scaling solutions and the groundbreaking use of Postgres for machine learning, as Montana Low and Lev Kokotov reveal how PostgresML and PG Cat are revolutionizing database management by integrating ML capabilities directly within the system.
Episode Highlights
Practical AI logo

Popular Clips

Episode Highlights

  • Capabilities

    PostgresML introduces a groundbreaking approach by integrating machine learning capabilities directly within the Postgres database. explains that users can train and deploy models using SQL, simplifying the process for those familiar with Postgres but not necessarily with machine learning frameworks like TensorFlow or PyTorch 1. The platform supports vector operations, crucial for tasks like NLP, allowing for sophisticated model applications directly within the database 2. emphasizes the importance of these features, noting that data scientists often deal with complex data transformations before model training 3.

    You don't really need to know what the difference is between a support vector machine and a gradient boosted tree model is. You pick the one with the best score and you move on with whatever your business is.

    ---

    This integration allows for seamless machine learning operations, reducing the need for extensive data movement and external processing.

       

    Origins

    The creation of PostgresML stemmed from the founders' experiences at Instacart, where they faced challenges with scaling machine learning infrastructure. shares how his journey began with transitioning Instacart's data systems to more scalable architectures, which laid the groundwork for PostgresML 4. recounts his role in building these systems, emphasizing the need for efficient data handling and processing capabilities 5. Their collaboration led to innovations that simplified complex data workflows, ultimately inspiring the development of PostgresML 6.

    We were getting large enough that we needed to move out of a monolithic rails app into more of a distributed architecture that would be horizontally scalable.

    ---

    This evolution highlights the importance of adaptable data solutions in rapidly growing tech environments.

       

    Challenges

    Integrating machine learning with databases presents unique challenges, which PostgresML aims to address. describes the difficulties faced during the pandemic, where existing systems struggled under increased load, prompting a shift to Postgres-based solutions 7. This transition revealed inefficiencies in traditional data pipelines, highlighting the need for more robust and scalable infrastructures 8. notes that PostgresML facilitates essential data transformations, allowing users to clean and prepare data directly within the database 9.

    You got to have some kind of like, you know, mathematical operations on your data. Like you have to like, be able to transform things.

    ---

    These capabilities streamline the integration process, making machine learning more accessible and efficient.

       

    Vision

    Looking ahead, the creators of PostgresML envision a future where machine learning workflows are simplified and more accessible. and both emphasize the importance of reducing complexity in ML processes, allowing smaller teams to maintain high-quality production standards 10. They aim to create a system where machine learning engineers can focus on their core tasks without being bogged down by infrastructure concerns 11.

    I want machine learning engineers to do machine learning that they actually enjoy, as opposed to figuring out how to, like, how to load balance the service.

    ---

    This vision underscores their commitment to enhancing the usability and efficiency of machine learning tools.

Related Episodes