Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support maze dataset tokenizers update #214

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
0ff866e
Add check on <UNK> token for `maze-dataset` update
aaron-sandoval Apr 18, 2024
ac2c09c
Update dependencies, including `maze-dataset = "^1.0.0"`
aaron-sandoval Apr 25, 2024
a52baca
maze-dataset PR #37 moved token_utils.py and util.py to a different d…
aaron-sandoval May 17, 2024
404e49f
Update mostly just type hints for `MazeTokenizerModular`. No updates …
aaron-sandoval Jun 29, 2024
8424609
Updated unit tests to incorporate `MazeTokenizerModular`. Not run yet
aaron-sandoval Jun 29, 2024
58b6ef5
made a comment
aaron-sandoval Jul 22, 2024
283bdd0
wip making a make recipe to run tests with a user-provided branch of …
aaron-sandoval Jul 22, 2024
eab11f4
wip, bogged down in Windows vs Linux crap
aaron-sandoval Jul 22, 2024
f83e169
wip, still stuck
aaron-sandoval Jul 22, 2024
c0a48a2
m-d git branch environment specified in maze-dataset_test directory
aaron-sandoval Jul 26, 2024
d34e6f5
Environment was broken in subdirectory. Move it to the main environment
aaron-sandoval Jul 26, 2024
f4f9303
Merge branch 'update-maze-dataset-tokenizers-step2' into add-maze-dat…
aaron-sandoval Jul 26, 2024
667c6ce
Merge pull request #216 from understanding-search/add-maze-dataset-br…
aaron-sandoval Jul 26, 2024
dc30b32
Merge branch 'update-maze-dataset-tokenizers-step2' of https://github…
aaron-sandoval Jul 26, 2024
6711a7e
Small edits to get unit tests to collect
aaron-sandoval Jul 26, 2024
fb0b1da
Merge branch 'main' into update-maze-dataset-tokenizers-step2
mivanit Jul 26, 2024
02537b8
bump maze-dataset
mivanit Jul 26, 2024
1b39086
run format
mivanit Jul 26, 2024
ec748ef
fix imports, unit tests collect
mivanit Jul 26, 2024
74df787
upstream mmtokenizer summary() fix
mivanit Jul 26, 2024
34e9e74
?????????
mivanit Jul 26, 2024
24ca2c4
run format
mivanit Jul 26, 2024
7138677
legacy mt was loaded as mmt by mistake
mivanit Jul 26, 2024
6c3cce2
re-run nb
mivanit Jul 26, 2024
3c244b0
fix loading maze tokenizers
mivanit Jul 26, 2024
ced372d
update dep
mivanit Jul 26, 2024
af3f953
`test_tokenization_encoding` passing
aaron-sandoval Aug 1, 2024
e2f94ac
`test_tokenizer_inside_hooked_transformer` passing
aaron-sandoval Aug 1, 2024
0ddfa23
`test_cfg_post_init` passing
aaron-sandoval Aug 1, 2024
fa825c0
Everything in `test_config_holder.py` passing
aaron-sandoval Aug 1, 2024
2c1d19e
`test_random_baseline` passing. 2 zanj tests are the only ones still …
aaron-sandoval Aug 1, 2024
6e585d3
format
aaron-sandoval Aug 1, 2024
be99a06
zanj save load tests with multiple tokenizers
mivanit Aug 13, 2024
4157519
Merge branch 'main' into update-maze-dataset-tokenizers-step2
mivanit Aug 20, 2024
9e7b888
poetry update
mivanit Aug 20, 2024
3a589b1
fix failing model loading tests
mivanit Aug 20, 2024
3956867
integration test where too many vocab elements caused argsort fail
mivanit Aug 20, 2024
0650190
trained new demo model
mivanit Aug 20, 2024
abc3dcc
replaced demo model path in tests, chnaged notebook cfg to test
mivanit Aug 20, 2024
8311eb8
move training tests to test_train_model.py
mivanit Aug 20, 2024
0064898
trying to fix pytest hang issue by closing wandb run
mivanit Aug 20, 2024
7cf67a7
return logger in TrainingResult object from train_model
mivanit Aug 21, 2024
fc42faa
update maze-dataset, new version should maybe fix wandb issues?
mivanit Aug 21, 2024
104e004
re-run notebook to get new model with fixed keys
mivanit Aug 21, 2024
3e254b2
changed cfg back to test in train nb, re-run
mivanit Aug 21, 2024
9f15e7e
format
mivanit Aug 21, 2024
4df4a24
update maze-dataset dep
mivanit Aug 22, 2024
4b3e8ca
ok this bug is incomprehensible
mivanit Aug 22, 2024
98a2083
fixed bug - passing configs passed by ref and modified
mivanit Aug 22, 2024
85ee878
format
mivanit Aug 22, 2024
1f46532
fix paths in notebooks
mivanit Aug 22, 2024
3f795f7
remove old stuff from makefile
mivanit Aug 22, 2024
096f6df
update dep
mivanit Aug 23, 2024
97fba7b
update dep
mivanit Aug 27, 2024
0d05191
update dep
mivanit Aug 27, 2024
04039a4
update deps??
mivanit Aug 27, 2024
31442ec
update dep
mivanit Aug 27, 2024
16e60e4
update dep to maze-dataset 1.0.0
mivanit Aug 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
junk/
.vscode/
data/**
notebooks/data/**
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Loading