Published Jan 23, 2025

Video generation with realistic motion

Explore how Genmo, founded by Chris Benson and Ajay, is transforming video generation with AI models capable of realistic motion by overcoming data and computational challenges, paving the way for democratized content creation with profound societal implications.
Episode Highlights
Practical AI logo

Popular Clips

Questions from this episode

Episode Highlights

  • Data Challenges

    Video generation models face significant data and computational challenges, primarily due to the immense volume of video data compared to images or text. explains that training these models requires massive datasets, often in the petabytes, which poses a barrier for new entrants without specialized expertise 1. He highlights the importance of sourcing high-quality motion data, as most online videos lack dynamic movement, which is crucial for teaching models about physics and realism 2.

    The goal with video models is to learn physics and realism and the laws that govern our world.

    ---

    The challenge extends to curating datasets that can effectively teach these base rules, making it a non-trivial task for developers.

       

    Realistic Motion

    Achieving realistic motion in video generation is a complex technical challenge that requires significant computational resources. notes that training a video model is akin to managing a million-token context window in language models, demanding extensive GPU power 3. He emphasizes the need for models to understand the laws of reality, such as ensuring a character drinks water correctly from a glass, which is a test of the model's grasp of physics 4.

    It's a Jedi mind trick. Like you just cannot. You should not be able to do that.

    ---

    These challenges highlight the importance of balancing model size and capability to ensure accessibility without compromising on realism.

Related Episodes