Transformer Weight Compression

George explains how transformers work due to weight sharing, not just attention mechanisms. He delves into dynamic weight changes and the concept of flashattention for compression. Alessio and George compare their ambitions in physics and information theory, revealing their core differences.