Winning without optimization / SFT, by modeling (outcome, moves) pairs #1

dpaleka · 2023-09-25T13:41:20Z

Add the incl_winner flag which adds the outcome (player 1 win, player 2 win, draw) in front of the list of moves. Training on the outcome enables conditioning on it at inference time; conditioning on (player 1 win) increases the win rate from under 50% to around 63%. This is somewhat inspired by Decision Transformers.

…sformer lite

pHaeusler · 2023-09-25T22:21:24Z

generate_data.py

-if __name__ == "__main__":
+# Use typer instead
+import typer
+app = typer.Typer()


is app needed here?
also, let's import at the top of the file

pHaeusler · 2023-09-25T22:22:17Z

requirements.txt

+torch
+typer
+wandb
+chardet


is chardet used?

This was likely just an issue with my conda env; removing that

pHaeusler · 2023-09-25T22:22:56Z

generate_data.py

@@ -53,29 +53,52 @@ def seq_to_board(seq):
    return board


-def save_data(trajectories):
+w_map = {-1: 0, 1: 1, None: 2}


Should probably use new tokens for these. (declare in tokens.py)

pHaeusler

Nice! Couple comments

save outcome in data, then condition on it in training, decision tran…

cf0ee55

…sformer lite

pHaeusler reviewed Sep 25, 2023

View reviewed changes

dpaleka force-pushed the main branch from d38218b to ec86609 Compare September 26, 2023 16:25

Refactored, add more tokens for incl_winner, more training steps

9976f7c

dpaleka force-pushed the main branch from ec86609 to 9976f7c Compare September 26, 2023 16:26

pHaeusler approved these changes Sep 29, 2023

View reviewed changes

dpaleka marked this pull request as ready for review September 30, 2023 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Winning without optimization / SFT, by modeling (outcome, moves) pairs #1

Winning without optimization / SFT, by modeling (outcome, moves) pairs #1

dpaleka commented Sep 25, 2023 •

edited

Loading

pHaeusler Sep 25, 2023

pHaeusler Sep 25, 2023

dpaleka Sep 26, 2023

pHaeusler Sep 25, 2023

pHaeusler left a comment

Winning without optimization / SFT, by modeling (outcome, moves) pairs #1

Are you sure you want to change the base?

Winning without optimization / SFT, by modeling (outcome, moves) pairs #1

Conversation

dpaleka commented Sep 25, 2023 • edited Loading

pHaeusler Sep 25, 2023

Choose a reason for hiding this comment

pHaeusler Sep 25, 2023

Choose a reason for hiding this comment

dpaleka Sep 26, 2023

Choose a reason for hiding this comment

pHaeusler Sep 25, 2023

Choose a reason for hiding this comment

pHaeusler left a comment

Choose a reason for hiding this comment

dpaleka commented Sep 25, 2023 •

edited

Loading