Open Data Initiatives

ML Commons aims to promote best practices in machine learning by supporting open data creation pipelines. The multilingual spoken words corpus stands out with its extensive collection of 23 million examples across 50 languages, including languages like Ukrainian that previously lacked datasets. This initiative not only enhances accessibility but also empowers research and commercial use with its open licensing.