Multimodal Learning Insights
Kate explores the evolution of multimodal learning, highlighting its roots in audio-visual speech recognition and the transformative impact of large-scale data collection from the web. She discusses the emergence of new properties in large models, particularly in vision-language tasks, and emphasizes the advantages of using captioned images for pre-training, especially in zero-shot learning scenarios. The conversation reveals how leveraging freely available data can significantly enhance model performance and generalization.In this clip
From this podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
More Language, Less Labeling with Kate Saenko - #580
Related Questions