Published Mar 1, 2024

762: Gemini 1.5 Pro, the Million-Token-Context LLM — with Jon Krohn (@JonKrohnLearns)

Jon Krohn delves into the transformative potential of Google's Gemini Pro 1.5, a groundbreaking million-token language model with multimodal features, highlighting the challenges it faces and the innovative paths forward for enhancing its performance in the AI industry.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

  • Contextual Challenges

    Jon Krohn explores the contextual challenges faced by the Gemini Pro 1.5 language model, highlighting its struggle with accurately processing information. He shares an instance where the algorithm failed to provide correct timestamps for a video, producing fabricated content instead. This issue arises because the model doesn't process audio from uploaded videos, leading to hallucinated outputs. Krohn notes, "It turns out everything that Gemini 1.5 pro output was hallucinated, completely made up and done very confidently." 1 2

       

    Solution Approaches

    To address these limitations, Jon considers potential solutions, such as combining video and audio analysis. He suggests identifying visual cues, like smiling, and cross-referencing them with audio to enhance the model's accuracy. This approach could mitigate the model's current shortcomings, as he states, "The algorithm does work very well, as long as you, you're not expecting to get audio results from the video alone." 2

Related Episodes