Bias in Speech Data
Josh discusses the issue of bias in speech data sets and introduces the RD bias corpus, a dataset designed to diagnose bias in speech-to-text models. He explains how the corpus helps identify biases based on demographic groups and highlights the importance of addressing bias in language technology.In this clip
From this podcast

Practical AI
Speech tech and Common Voice at Mozilla
Related Questions
How does diversity impact data analysis in the episode Speech tech and Common Voice at Mozilla and the clip Influencing Training Data?
Are there biases in AI as discussed in the episode More Language, Less Labeling with Kate Saenko - #580 and the clip Data Bias Challenges?
What role do biases play in learning as discussed in the episode Language Understanding and LLMs with Christopher Manning - 686 and the clip Learning with Bias?