Published Jan 19, 2022

Democratizing ML for speech

David Kanter, executive director of MLCommons, delves into their mission to democratize machine learning, emphasizing the power of open data and multilingual speech initiatives. The episode explores how diverse datasets can fuel innovation, community collaboration, and representation of underrepresented languages.

Episode Highlights

Topics covered

Episode Highlights

Open Data Benefits

Open data sets have become a catalyst for innovation in machine learning, as explains. He emphasizes that data is the raw ingredient for machine learning, akin to iron and coal during the industrial revolution 1. Open data allows researchers from even the largest tech companies to share techniques and drive the industry forward. notes, "ML means more. People doing cool things with computers means more" 2. This collaborative approach enables researchers to tackle complex problems together, fostering a culture of innovation and progress 3.

Data Maintenance

Maintaining open data sets is a continuous process that involves regular updates and community engagement. and discuss the challenges of keeping data sets relevant, especially as language and technology evolve rapidly 4. highlights the importance of balancing open data with proprietary information, suggesting that organizations can benefit from both approaches 5. He explains, "A large amount of modest quality data... can ultimately prove to be useful," emphasizing the value of machine-labeled data in training models 6.

Related Episodes

Accelerating ML innovation at MLCommons
Answers 383 questions
NLP for the world's 7000+ languages
Answers 383 questions
Speech tech and Common Voice at Mozilla
Answers 383 questions
Open source data labeling tools
Answers 383 questions
Operationalizing ML/AI with MemSQL
Answers 383 questions
The ins and outs of open source for AI
Answers 383 questions
Data synthesis for SOTA LLMs
Answers 383 questions
Generative models: exploration to deployment
Answers 383 questions
Machine learning at small organizations
Answers 383 questions
The influence of open source on AI development
Answers 383 questions
Killer developer tools for machine learning
Answers 383 questions
GANs, RL, and transfer learning oh my!
Answers 383 questions
Applied NLP solutions & AI education
Answers 383 questions
scikit-learn & data science you own
Answers 383 questions
Exploring a new AI lexicon
Answers 383 questions

Dexa/Practical AI

Democratizing ML for speech

Topics covered

Popular Clips

Cutting-Edge Speech Data

Multilingual Keyword Spotting

Evaluating Open Source

Multilingual Speech Data

Advancing Machine Learning

AI Insights

Open Data Impact

Diverse Speech Data

ML Commons Update

Power of Open Data

Data Set Management

ML Commons Roadmap

Machine-Labeled Data

ML Data Sets

Episode Highlights

MLCommons Initiatives

Impact of Open Data

Open Data Benefits

Data Maintenance

Multilingual Speech Initiatives

Related Episodes

Accelerating ML innovation at MLCommons

NLP for the world's 7000+ languages

Speech tech and Common Voice at Mozilla

Open source data labeling tools

Operationalizing ML/AI with MemSQL

The ins and outs of open source for AI

Data synthesis for SOTA LLMs

Generative models: exploration to deployment

Machine learning at small organizations

The influence of open source on AI development

Killer developer tools for machine learning

GANs, RL, and transfer learning oh my!

Applied NLP solutions & AI education

scikit-learn & data science you own

Exploring a new AI lexicon

Democratizing ML for speech

Topics covered

Popular Clips

Episode Highlights

MLCommons Initiatives

Impact of Open DataThe discussion shifts to the transformative power of open data in machine learning, highlighting its role in fostering innovation and community collaboration. David Kanter shares insights on maintaining these data sets to ensure their continued relevance and utility.

Impact of Open Data

Open Data Benefits

Data Maintenance

Multilingual Speech InitiativesDavid Kanter discusses the creation and impact of diverse, multilingual speech datasets. He highlights the challenges and innovations involved in assembling these datasets, which aim to democratize machine learning for speech by covering less-represented languages.

Multilingual Speech Initiatives

Related Episodes