762: Gemini 1.5 Pro, the Million-Token-Context LLM — with Jon Krohn (@JonKrohnLearns)

Topics covered
Popular Clips
Episode Highlights
Contextual Challenges
Jon Krohn explores the contextual challenges faced by the Gemini Pro 1.5 language model, highlighting its struggle with accurately processing information. He shares an instance where the algorithm failed to provide correct timestamps for a video, producing fabricated content instead. This issue arises because the model doesn't process audio from uploaded videos, leading to hallucinated outputs. Krohn notes, "It turns out everything that Gemini 1.5 pro output was hallucinated, completely made up and done very confidently." 1 2
Solution Approaches
To address these limitations, Jon considers potential solutions, such as combining video and audio analysis. He suggests identifying visual cues, like smiling, and cross-referencing them with audio to enhance the model's accuracy. This approach could mitigate the model's current shortcomings, as he states, "The algorithm does work very well, as long as you, you're not expecting to get audio results from the video alone." 2
Related Episodes

666: GPT-4 — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
670: LLaMA: GPT-3 performance, 10x smaller — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
SDS 438: Artificial General Intelligence — with Jon Krohn
Answers 383 questions
684: Get More Language Context out of your LLM — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
748: The Five Levels of AGI — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
772: In Case You Missed It in March 2024 — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
768: Is Claude 3 Better than GPT-4? — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
854: The Six Epochs of Intelligence Evolution — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions






