what are the best performing models? #3

shouldsee · 2022-09-20T13:20:05Z

Thanks for sharing! Just wondering what's going on in terms of

Cybernetic1 · 2022-09-24T07:04:24Z

Hey, sorry I did not see your message.

First we find an algorithm (such as PPO or Soft Actor-Critic) and adapt its code to solve Tic Tac Toe. The first version would use board-vector and second version use logic proposition embedding.
For each game the highest score is 20. I just want to demonstrate convergence first.
All the models tested so far -- based on naive policy gradient -- failed to converge to the highest score. Some of them reached close to the highest score but were unstable.

shouldsee · 2022-09-25T03:34:15Z

Ok thanks. I guess the core question is, how is the score calculated? which function did you use to calculate the score?

Provide feedback