Masked Token Generation

Mask Git utilizes masked tokens to predict next tokens in parallel, offering a significant speed advantage over traditional methods. The current performance is around one frame per second, prompting discussions on optimization strategies like model distillation and efficiency improvements. Ashley shares insights from her background in reinforcement learning, particularly her interest in enabling agents to learn from videos instead of predefined reward functions.