Published Jan 13, 2025

Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - 714

Abhijit Bose of Capital One delves into the evolution of their Generative AI platform, spotlighting a platform-centric approach that marries centralized governance with cutting-edge AI tools and Kubernetes integration, enhancing flexibility and resource optimization for enterprise operations and customer service.
Episode Highlights
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) logo

Popular Clips

Questions from this episode

Episode Highlights

  • Kubernetes

    The integration of Kubernetes has significantly enhanced AI platform development at Capital One. explains that their robust platform control plane, based on Kubernetes, allows for flexibility in incorporating various tools and services, including those from AWS and open-source communities 1. This flexibility has been crucial in extending their machine learning platform to support generative AI use cases, enabling rapid adaptation and innovation. highlights the complexity of data annotation in generative AI compared to traditional machine learning, emphasizing the need for refined capabilities and tools 2.

       

    Observability

    Enhancing observability tools is vital for managing the complexities of generative AI applications. notes that while traditional machine learning requires solid monitoring for model drift and input features, generative AI introduces new challenges like LLM hallucinations, necessitating advanced guardrails and logging systems 3. These enhancements ensure proper governance and execution of agentic workflows, making observability not just important but complex. emphasizes leveraging existing anomaly detection algorithms and extending them to handle new data types, ensuring comprehensive monitoring across platforms 4.

       

    Inference Optimization

    Optimizing inference efficiency is a critical focus at Capital One, with efforts to reduce costs and latency from the outset. shares that maintaining low cost per token and latency are key performance indicators, requiring continuous optimization of GPU utilization and other techniques 5. This involves leveraging both proprietary and open-source tools to enhance inference workflows, ensuring effective deployment of fine-tuned models. highlights the collaboration between science and engineering teams to integrate advanced techniques like quantization and speculative decoding into their inference systems 6.

Related Episodes