You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First we find an algorithm (such as PPO or Soft Actor-Critic) and adapt its code to solve Tic Tac Toe. The first version would use board-vector and second version use logic proposition embedding.
For each game the highest score is 20. I just want to demonstrate convergence first.
All the models tested so far -- based on naive policy gradient -- failed to converge to the highest score. Some of them reached close to the highest score but were unstable.
Thanks for sharing! Just wondering what's going on in terms of
The text was updated successfully, but these errors were encountered: