Published Jun 30, 2023

692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU — with Jon Krohn

Jon Krohn delves into innovative techniques for running large language models efficiently on a single GPU, exploring the Qlora and SPQR methods that enhance model tuning through advanced parameter adaptation and lossless weight compression, achieving performance close to ChatGPT-level while maintaining accuracy.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

Related Episodes