Published Jun 30, 2023

692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU — with Jon Krohn

Jon Krohn delves into innovative techniques for running large language models efficiently on a single GPU, exploring the Qlora and SPQR methods that enhance model tuning through advanced parameter adaptation and lossless weight compression, achieving performance close to ChatGPT-level while maintaining accuracy.

Episode Highlights

Topics covered

Popular Clips

Efficient Model Compression
Play Clip

Episode Highlights

Related Episodes

678: StableLM: Open-source "ChatGPT"-like LLMs you can fit on one GPU — with @JonKrohnLearns
Answers 383 questions
704: Jon’s “Generative A.I. with LLMs” Hands-on Training — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation) — with Jon Krohn
Answers 383 questions
772: In Case You Missed It in March 2024 — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
670: LLaMA: GPT-3 performance, 10x smaller — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
650: SparseGPT: Remove 100 Billion Parameters but Retain 100% Accuracy — with Jon Krohn
Answers 383 questions
694: CatBoost: Powerful, efficient ML for large tabular datasets — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
758: The Mamba Architecture: Superior to Transformers in LLMs — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
728: Use Contrastive Search to get Human-Quality LLM Outputs — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
822: NotebookLM: Jaw-Dropping Podcast Episodes Generated About Your Documents — with Jon Krohn
Answers 383 questions
706: Large Language Model Leaderboards and Benchmarks — with Caterina Constantinescu
Answers 383 questions
824: Llama 3.2: Open-Source Edge and Multimodal LLMs — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
784: Aligning Large Language Models — with Sinan Ozdemir
Answers 383 questions
676: The Chinchilla Scaling Laws — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions
778: Mixtral 8x22B: SOTA Open-Source LLM Capabilities at a Fraction of the Compute — with Jon Krohn
Answers 383 questions

Dexa/Super Data Science: ML & AI Podcast with Jon Krohn

692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU — with Jon Krohn

Topics covered

Popular Clips

Efficient Model Compression

Episode Highlights

Advanced Model Techniques

Weight Compression

Quantization Techniques

Related Episodes

678: StableLM: Open-source "ChatGPT"-like LLMs you can fit on one GPU — with @JonKrohnLearns

704: Jon’s “Generative A.I. with LLMs” Hands-on Training — with Jon Krohn (@JonKrohnLearns)

674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation) — with Jon Krohn

772: In Case You Missed It in March 2024 — with Jon Krohn (@JonKrohnLearns)

670: LLaMA: GPT-3 performance, 10x smaller — with Jon Krohn (@JonKrohnLearns)

650: SparseGPT: Remove 100 Billion Parameters but Retain 100% Accuracy — with Jon Krohn

694: CatBoost: Powerful, efficient ML for large tabular datasets — with Jon Krohn (@JonKrohnLearns)

758: The Mamba Architecture: Superior to Transformers in LLMs — with Jon Krohn (@JonKrohnLearns)

728: Use Contrastive Search to get Human-Quality LLM Outputs — with Jon Krohn (@JonKrohnLearns)

822: NotebookLM: Jaw-Dropping Podcast Episodes Generated About Your Documents — with Jon Krohn

706: Large Language Model Leaderboards and Benchmarks — with Caterina Constantinescu

824: Llama 3.2: Open-Source Edge and Multimodal LLMs — with Jon Krohn (@JonKrohnLearns)

784: Aligning Large Language Models — with Sinan Ozdemir

676: The Chinchilla Scaling Laws — with Jon Krohn (@JonKrohnLearns)

778: Mixtral 8x22B: SOTA Open-Source LLM Capabilities at a Fraction of the Compute — with Jon Krohn

692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU — with Jon Krohn

Topics covered

Popular Clips

Episode Highlights

Advanced Model TechniquesJon Krohn explores the Qlora approach, which enhances model tuning by integrating advanced parameter adaptation with quantization. This innovative method allows large language models to be fine-tuned efficiently on a single GPU, achieving near ChatGPT-level performance.

Advanced Model Techniques

Weight CompressionJon Krohn explores the SPQR method, a revolutionary approach to lossless LLM weight compression that allows massive models to run efficiently on a single GPU. This innovation leverages quantization to maintain accuracy while significantly reducing model size and improving speed.

Weight Compression

Quantization TechniquesJon Krohn explores the SPQR approach, a groundbreaking method for lossless LLM weight compression. This technique leverages quantization to enable large models to run efficiently on a single GPU, maintaining accuracy while reducing computational demands.

Quantization Techniques

Related Episodes