Published Jun 30, 2023
692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU — with Jon Krohn
Jon Krohn delves into innovative techniques for running large language models efficiently on a single GPU, exploring the Qlora and SPQR methods that enhance model tuning through advanced parameter adaptation and lossless weight compression, achieving performance close to ChatGPT-level while maintaining accuracy.

Topics covered
Popular Clips
Episode Highlights
Related Episodes

772: In Case You Missed It in March 2024 — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
670: LLaMA: GPT-3 performance, 10x smaller — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions

706: Large Language Model Leaderboards and Benchmarks — with Caterina Constantinescu
Answers 383 questions

784: Aligning Large Language Models — with Sinan Ozdemir
Answers 383 questions
676: The Chinchilla Scaling Laws — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
