Evaluating Model Performance

Erik and Andreas discuss the challenges of evaluating model performance due to noise and interrater reliability. Breaking tasks into smaller, compositional parts helps make evaluation more manageable and errors more obvious.