Transformer Weight Compression
George explains how transformers work due to weight sharing, not just attention mechanisms. He delves into dynamic weight changes and the concept of flashattention for compression. Alessio and George compare their ambitions in physics and information theory, revealing their core differences.In this clip
From this podcast

Latent Space - The AI Engineer Podcast (Video Podcast)
Ep 18: Petaflops to the People — with George Hotz of tinycorp
Related Questions