Lossless LLM Compression
Discover the groundbreaking SPQR approach, which enables near lossless compression of large language models, allowing a 33 billion parameter model to run on a single GPU without sacrificing accuracy. This innovative method utilizes quantization to achieve significant size reduction and faster inference speeds, making it a game changer for deploying powerful LLMs in practical applications.In this clip
From this podcast

Super Data Science: ML & AI Podcast with Jon Krohn
692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU — with Jon Krohn
Related Questions