Multimodal Learning Insights

Kate explores the evolution of multimodal learning, highlighting its roots in audio-visual speech recognition and the transformative impact of large-scale data collection from the web. She discusses the emergence of new properties in large models, particularly in vision-language tasks, and emphasizes the advantages of using captioned images for pre-training, especially in zero-shot learning scenarios. The conversation reveals how leveraging freely available data can significantly enhance model performance and generalization.

In this clip
From this podcast
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
More Language, Less Labeling with Kate Saenko - #580
Related Questions
- Are there biases in AI as discussed in the episode More Language, Less Labeling with Kate Saenko - #580 and the clip Image Captioning Advances?
- Are there biases in AI as discussed in the episode More Language, Less Labeling with Kate Saenko - #580 and the clip Image Captioning Advances?

Multimodal Learning Insights

In this clip

From this podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

More Language, Less Labeling with Kate Saenko - #580

Related Questions

Are there biases in AI as discussed in the episode More Language, Less Labeling with Kate Saenko - #580 and the clip Image Captioning Advances?

Are there biases in AI as discussed in the episode More Language, Less Labeling with Kate Saenko - #580 and the clip Image Captioning Advances?