Video Model Challenges
Pre-training video generation models is incredibly GPU intensive, requiring advanced hardware like H100 GPUs. While certain capabilities, such as walking, only emerge at higher parameter scales, models like Mochi one strike a balance by being accessible yet powerful, operating on consumer-grade GPUs. As video generation involves long sequence lengths, the computational demands increase with each iteration, similar to language models generating tokens.In this clip
From this podcast

Practical AI
Video generation with realistic motion
Related Questions