Published Jan 19, 2022

Democratizing ML for speech

David Kanter, executive director of MLCommons, delves into their mission to democratize machine learning, emphasizing the power of open data and multilingual speech initiatives. The episode explores how diverse datasets can fuel innovation, community collaboration, and representation of underrepresented languages.
Episode Highlights
Practical AI logo

Popular Clips

Episode Highlights

  • Open Data Benefits

    Open data sets have become a catalyst for innovation in machine learning, as explains. He emphasizes that data is the raw ingredient for machine learning, akin to iron and coal during the industrial revolution 1. Open data allows researchers from even the largest tech companies to share techniques and drive the industry forward. notes, "ML means more. People doing cool things with computers means more" 2. This collaborative approach enables researchers to tackle complex problems together, fostering a culture of innovation and progress 3.

       

    Data Maintenance

    Maintaining open data sets is a continuous process that involves regular updates and community engagement. and discuss the challenges of keeping data sets relevant, especially as language and technology evolve rapidly 4. highlights the importance of balancing open data with proprietary information, suggesting that organizations can benefit from both approaches 5. He explains, "A large amount of modest quality data... can ultimately prove to be useful," emphasizing the value of machine-labeled data in training models 6.

Related Episodes