Experimenting with TQN

Promising initial results with the pong game led to unexpected challenges when scaling experiments. Despite efforts from the team, performance dipped below both DDQN and random policies. A deeper analysis revealed that using short tree depths in the TQN model hindered the ability to capture negative experiences effectively.