Latent Action Models
The discussion dives into the intricacies of learning a latent action model that operates on raw pixels, allowing for effective interaction with environments. By utilizing a video tokenizer to discretize frames, the model predicts tokens that facilitate next-frame predictions. The decision to limit the action space to eight enhances usability and maintains consistency across successive frames.In this clip
From this podcast

Machine Learning Street Talk (MLST)
Ashley Edwards - Genie Paper (DeepMind/Runway)
Related Questions