Evaluating AI Complexity

Kanjun highlights the significant gap between AI benchmarks and real-world complexity, emphasizing the need for more nuanced evaluations. He discusses innovative approaches, such as using wild code datasets to challenge models, revealing that many public models struggle to perform effectively in practical scenarios. This conversation sheds light on the importance of understanding edge cases and the limitations of current testing methods.