CITED CLIPS
Transformer Weight Compression
George explains how transformers work due to weight sharing, not just attention mechanisms. He delves into dynamic weight changes and the concept of flashattention for compression. Alessio and George compare their ambitions in physics and information theory, revealing their core differences.In this clip
From this podcast
Latent Space - The AI Engineer Podcast (Video Podcast)
Ep 18: Petaflops to the People — with George Hotz of tinycorp
Related Questions