Published Nov 11, 2022

SDS 626: Subword Tokenization with Byte-Pair Encoding — with @JonKrohnLearns

Jon Krohn delves into the world of tokenization in natural language processing, spotlighting the innovative subword methods with a deep dive into byte-pair encoding's role in enhancing models like BERT and GPT-3.

Episode Highlights

Topics covered

Tokenization Methods
01
Byte-Pair Encoding
02

Popular Clips

Tokenization Techniques
Play Clip

Episode Highlights

Related Episodes

SDS 506: Supervised vs Unsupervised Learning — with Jon Krohn
Answers 383 questions
SDS 446: Getting Started in Machine Learning — with Jon Krohn
Answers 383 questions
SDS 554: @JonKrohnLearns's Deep Learning Courses
Answers 383 questions
SDS 556: @JonKrohnLearns's Machine Learning Courses
Answers 383 questions
SDS 558: @JonKrohnLearns's Answers to Questions on Machine Learning
Answers 383 questions
SDS 620: OpenAI Whisper: General-Purpose Speech Recognition — with @JonKrohnLearns
Answers 383 questions
SDS 476: Peer-Driven Learning — with Jon Krohn
Answers 383 questions
SDS 456: The Pomodoro Technique — with Jon Krohn
Answers 383 questions
SDS 484: Algorithm Aversion — with Jon Krohn
Answers 383 questions
SDS 624: Imagen Video: Incredible Text-to-Video Generation — with @JonKrohnLearns
Answers 383 questions
SDS 510: Deep Reinforcement Learning — with Jon Krohn
Answers 383 questions
SDS 474: The Machine Learning House — with Jon Krohn
Answers 383 questions
SDS 568: PaLM: Google's Breakthrough Natural Language Model — with Jon Krohn
Answers 383 questions
SDS 576: Tech Startup Dramas — with Jon Krohn
Answers 383 questions
SDS 468: The History of Data — with Jon Krohn
Answers 383 questions

SDS 626: Subword Tokenization with Byte-Pair Encoding — with @JonKrohnLearns​

Topics covered

Popular Clips

Episode Highlights

Tokenization MethodsJon Krohn explores the intricacies of tokenization in NLP, focusing on word, character, and subword methods. He highlights the strengths and limitations of each approach, emphasizing the power of subword tokenization with byte-pair encoding.

Tokenization Methods

Byte-Pair EncodingJon Krohn explores byte-pair encoding (BPE) and its transformative role in subword tokenization for natural language processing (NLP). He explains how BPE enhances the flexibility and efficiency of NLP models like BERT and GPT-3.

Byte-Pair Encoding

Related Episodes

SDS 626: Subword Tokenization with Byte-Pair Encoding — with @JonKrohnLearns