Transformer Symmetry Learning

Tim and Yannic discuss the potential for transformers to learn symmetries without positional encoding, highlighting the model's ability to capture patterns at different scales through overlapping waves. The conversation delves into the importance of relative encodings and the role of layers in understanding context and relationships between tokens.