Saturating Compute

Aidan and Lukas discuss how architectures need to saturate compute by minimizing unnecessary operations and maximizing parallelization, focusing on the efficiency of large language models like transformers. They explore the challenges of deviating from established architectures due to hardware and software optimizations.