Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mega-PR, adding a bunch of experiment notebooks and the required code for them. broad overview: - added some example models to `examples/` - reworked eval code, needs big changes -- see #200 - many modifications to mechinterp code - had to enforce transformerlens 1.6.1 due to tokenizer changes (it tried to get our custom tokenizer from huggingface?) - exported some code to muutils - notebooks added: - `eval_tasks_table.ipynb`: evaluate on a bunch of single token tasks. should be merged with other evals notebook - `appendix_figures.ipynb`: junk and duplicates of code in other notebooks :/ - `generate_rollouts.ipynb`: what the name says, simple notebooks comment history: * trying to see if wandb model loading is working right * moved dict shapes to muutils (its on unmerged branch tho) * better loading of models from wandb * wip???????????????? * way more testing for loading wandb models * aaaa * ??? * hallway run * update muutils dep to 0.5.3 * updated TL and maze-dataset dep * type hint * notebook runs? * wip runs * cleared notebooks? * exported eval plots * format * many fixes and changes sorry * wip * poetry lock * minor adjustment to make model names cleaner * exported single token tasks * refactored baseline model, allowed return of multiple options going to be useful for plot_logits * more baseline model refactor * format * dep? * train_model test was trying to train on 3M samples lol * seperate appendix figures notebooks, better logits plotting logits plotting now allows for adding other categories to the histogram besides correct / incorrect, which we can use the baseline model for * misc * rename original hallway model need to fix refs to it later lol * WE'RE SO BACK, ADJACENCY HEADS ARE HERE check the dla notebook!!! * correlation of attention and distance * misc * ok no more figures for now * temp notebooks, for experiments. move these to paper repo later * eval tasks table * final before unireps submit * misc fixes?? * added padding functionality and batched predictions * wip * wip * wip * added attention animation plotter * format * update deps * transformerlens 1.6.1 due to issues :/ * cleaning up notebooks latest versions of some were in experiments repo * fix up some notebooks, eval_model is still broken * providing hallway model * fixing eval_model issues with baseline solver batching was not working at all, had to add a hack to recursively call .generate() on RandomBaseline return type was list[str] instead of tensor or list[list[str]] so had to fix that as well * update dep to muutils 0.5.5 (poetry not recognizing it yet) * format * poetry lock * changed model used to hallway * changed model paths, no jirpy * update embedding structure nb * updated plot attention for better cbar * fix up eval tasks table notebook * fix when cbar is none * ran notebook
- Loading branch information