Evals groundwork #134

valedan · 2023-03-27T14:13:05Z

This is some supporting work and refactoring that I've broken out of my big upcoming evals PR to make things a little more manageable.

Included here:

Unifying the interface for eval functions (pathdist.py -> path_evals.py). Minimal functional changes here, just renaming some stuff and letting node_overlap take numpy arrays. These are not unit tested but Test eval metrics #112 is tracking that.
Move constants to seperate file
Add token utils and move decode_maze_tokens_to_coords there

review-notebook-app · 2023-03-27T14:13:28Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rusheb

Nice changes. Few comments, mostly nits

rusheb · 2023-03-27T14:44:28Z

maze_transformer/evaluation/path_evals.py

+    @classmethod
+    def all_functions(cls) -> dict[str, PathEvalFunction]:
+        excluded = ["all_functions"]
+
+        return {
+            **{
+                name: func
+                for name, func in cls.__dict__.items()
+                if not name.startswith("_") and name not in excluded
+            }
+        }


I think this is quite risky without proper type checking. Could lead to adding functions with the incorrect return type.

Not sure what to do about this, maybe we just need to prioritise #18 or maybe there is a more elegant solution involving interfaces

Yep this is a fair concern. Enforcing types more strongly would help for sure.

If there's a more elegant solution I don't know what it is - I did think this code was a bit concerning when I moved it in here. But there is a valid need here for us to be able to easily get a list of all path eval functions. I didn't want to just add some constant listing them somewhere because then there would be 2 places to update any time we change something.

If there's a more Pythonic way of doing this I'm definitely open to it!

I wonder if I could add some checks to this function to make sure everything being returned adheres to the correct interface.

Adding the following checks might help:

isinstance(func, typing.Callable)

isinstance(func, types.FunctionType) to check it's a static function

typing.get_type_hints(func)["return"] is float

typing.get_type_hints(func) can also help us check if the argument names & types are correct

a few other notes:

we probably want to wrap the calls to these functions in a try ... catch since we dont want the whole training script to crash if an eval fails. definitely print warnings though, and have a way to make it crash during integration tests. maybe this is already being done?

is there a reason we are unwrapping the dict only to wrap it again? (maybe this is my fault haha)

do we want to perform some type checks when loading the module, rather than waiting for all_functions to be called elsewhere? this should definitely throw an exception if it finds any functions which dont fit the criteria.

should we rename all_functions to _all_functions, thereby excluding it automatically (or maybe move out of the class)? hopefully this discourages adding non-evaluation functions to this class

I agree with Michael that the dictionary seems to be being unnecessarily unpacked and repacked. That this mistake slipped through is a sign that no one fully understands this implementation, which means we shouldn't use this implementation.

We should aim for code that is readable and maintainable rather than code which is clever. In this case I think you need to update the eval functions to remove unused parameters. Then you need to update all_functions to call the eval functions one by one (by name) and return their results in a dictionary. The method callsite will need to be updated.

Underscores in Python variables are used to indicate private methods so we shouldn't use it to name a public method.

Agree that we shouldn't use underscores since it's a public method, and agree that we should aim for readable and maintainable over cleverness.

The best solution here is probably to have EXCLUDED_FUNCTIONS as a class variable of the path evals class, rather than a private thing defined anew every time we run all_functions. We should also move out the "is this a valid path_eval function" check into a separate function. I'll take a crack at this.

tests/unit/maze_transformer/evaluation/test_eval_model.py

maze_transformer/generation/constants.py

tests/unit/maze_transformer/utils/test_token_utils.py

maze_transformer/utils/token_utils.py

rusheb · 2023-03-27T14:54:26Z

maze_transformer/utils/token_utils.py

I think it might make sense to have a TokenizedMaze class and add these as methods on that class. Not sure. What do you think?

Hmm, I like that idea. But it would be a huge change, definitely out of scope here. Everything that currently deals with tokens would need to be updated.

We've discussed before how we'll need to rethink what abstractions we're using for tokenization once we add additional tokenization schemes - maybe we can think about this then?

maze_transformer/utils/token_utils.py

rusheb · 2023-03-27T14:58:48Z

maze_transformer/utils/token_utils.py

+def get_path_tokens(tokens: list[str]) -> list[str]:
+    """The path is considered everything from the first path coord to the end of the list, including the path_end token (ie everything we are asking the model to predict)"""
+    start_idx = tokens.index(SPECIAL_TOKENS["path_start"]) + 1
+    return tokens[start_idx:]


If you made end_value param of tokens_between optional, then it would generalise to this

Hmm I'm torn here. I do see your point, and it would be elegant for everything in here to use tokens_between. And if start_value was also optional, it would generalize to get_tokens_up_to_path_start too.

But I'm hesitant for a couple of reasons

I don't want to add too much complexity to tokens_between

It's called tokens_between - that doesn't really fit with optional start or end values

get_path_tokens and get_tokens_up_to_path_start are fairly simple right now - the benefit of switching them to use tokens_between doesn't seem that big.

Right now I'm leaning towards leaving this as is. What do you think?

maze_transformer/utils/token_utils.py

tests/unit/maze_transformer/utils/test_token_utils.py

luciaquirke · 2023-03-28T10:40:53Z

maze_transformer/evaluation/path_evals.py

+    Iterate over the segments of a path.
+    """
+    i: int
+    n_s: Coord | CoordTup


could we rename n_s and n_e to be more meaningful? is this node south and node east 🤔

Haha good call - I have no idea what these mean. I'll figure it out and rename them.

maze_transformer/evaluation/path_evals.py

luciaquirke · 2023-03-28T10:46:12Z

maze_transformer/evaluation/path_evals.py

+
+    @staticmethod
+    def node_overlap(
+        maze: LatticeMaze, solution: MazePath, prediction: MazePath, /


These are dependent on outcome of all_methods discussion

Suggested change

maze: LatticeMaze, solution: MazePath, prediction: MazePath, /

solution: MazePath, prediction: MazePath, /

Yeah all these suggestions to remove unused args won't work unless we significantly change the approach here. The path evals all need to have the same interface because they're called in a for loop with the same inputs.

If there's a cool elegant Pythonic solution here I'm open to it 🙂

luciaquirke · 2023-03-28T10:46:38Z

maze_transformer/evaluation/path_evals.py

+
+    @staticmethod
+    def num_connections_adjacent_lattice(
+        maze: LatticeMaze, solution: MazePath, prediction: MazePath, /


Suggested change

maze: LatticeMaze, solution: MazePath, prediction: MazePath, /

prediction: MazePath, /

luciaquirke · 2023-03-28T10:47:01Z

maze_transformer/evaluation/path_evals.py

+
+    @staticmethod
+    def num_connections_adjacent(
+        maze: LatticeMaze, solution: MazePath, prediction: MazePath, /


Suggested change

maze: LatticeMaze, solution: MazePath, prediction: MazePath, /

maze: LatticeMaze, prediction: MazePath, /

luciaquirke · 2023-03-28T10:59:15Z

maze_transformer/evaluation/path_evals.py

+        )
+
+    @classmethod
+    def all_functions(cls) -> dict[str, PathEvalFunction]:


Is there somewhere I could read about using method dictionaries in python? I'm torn because I really like how this method is used in the notebooks, but sometimes complex code can end up being more trouble than it's worth when people struggle to modify it down the road. If there were good tests I would be less concerned. Will investigate further because I don't fully understand StatsCounter

I'm fairly new to Python, so if you find anything on this I'd like to know too. 🙂

the python docs are relatively unhelpful with this afaik, I found these two helpful SO threads:

https://stackoverflow.com/questions/48029249/python-dict

https://stackoverflow.com/questions/19907442/explain-dict-attribute

the tl;dr is that it contains all attributes of an object, but all the built-in ones like __module__, __doc__, and __dict__ itself start with __ to signify that they are "magic" methods. Since we only inherit from object, there shouldn't be anything other than a built-in in the class.

Starting with _ is convention in python for "this is a private method" or otherwise "don't use this unless you know what you're doing".

Tangent: the PEP style guide says never to invent double-underscore attributes and only use built-ins, but I may be guilty of doing this sometimes 😢

sorry if I'm stepping on your toes Dan, I took care of this in bf407c5

rusheb

Removing change request by commenting

luciaquirke

Ready to merge once the all_functions method has been updated. Update is looking awesome, keen to get this in 🙏

see dicussion: #134 (comment)

valedan · 2023-04-03T17:39:02Z

Okay I've spent probably an unreasonable amount of time looking into this all_functions thing over the past day. I think I have something that I'm moderately happy with. I've removed all_functions completely and gone with a decorator that can be used to make arbitrary method dicts.

A few benefits here:

Evals are just a class property now, no need to call a method to get them.
This decorator is easily reusable for other cases where we need a method dict, for example logit-based evals.
I've removed unused args from all the eval functions, and added a **_ to allow unused kwargs so that the evaluate_model approach still works.

There is a little bit of complexity with some of the type-system wrangling I had to do here, but I think it may be unavoidable.

@mivanit Thanks for taking a stab at this! I didn't end up using your approach because it was getting a bit too complex, and I don't think we need the asserts because that should be taken care of by the function signatures (required params) and type system.

Let me know if this is okay! Really hope I can merge today 😄

mivanit

This is a really clean solution! I like this a lot better than what we had before. The type checking I added was mostly just a stopgap until we actually have proper type checking working.

I'd have probably named evals as PATH_EVALS or something, but that's extremely minor.

Really good work on this!

yolo

rusheb · 2023-04-09T13:43:59Z

gone with a decorator that can be used to make arbitrary method dicts.

Very cool!!

valedan requested review from rusheb and luciaquirke March 27, 2023 14:13

rusheb previously requested changes Mar 27, 2023

View reviewed changes

luciaquirke reviewed Mar 28, 2023

View reviewed changes

maze_transformer/evaluation/path_evals.py Outdated Show resolved Hide resolved

luciaquirke reviewed Mar 28, 2023

View reviewed changes

valedan force-pushed the solved-maze-2 branch from d16440d to 1864239 Compare March 28, 2023 12:52

Base automatically changed from solved-maze-2 to main March 28, 2023 12:55

valedan added 10 commits March 28, 2023 09:00

Break SolvedMaze out of MazeTokenizer

1decbe6

fix issues after merging main

badd6ab

resolve PR issues

e90c36b

unify interface for pathdist evals

41396bf

move constants into seperate file

b2fd019

add token utils

d10320f

move eval_model tests to unit tests dir

5b5c61b

move decode_coords to utils

746de7a

fix imports and notebooks

bf88149

address pr feedback

211cef0

valedan force-pushed the evals-groundwork branch from 39a5f7b to 211cef0 Compare March 28, 2023 15:07

Merge branch 'main' into evals-groundwork

c93b53a

rusheb reviewed Mar 28, 2023

View reviewed changes

valedan requested a review from rusheb March 28, 2023 21:01

luciaquirke approved these changes Apr 1, 2023

View reviewed changes

mivanit and others added 2 commits April 1, 2023 09:41

added type checking and general cleanup of PathEvals.all_functions()

bf407c5

see dicussion: #134 (comment)

Use decorator-based method dict for path evals

1943cca

mivanit approved these changes Apr 3, 2023

View reviewed changes

valedan added 2 commits April 3, 2023 15:23

Merge branch 'main' into evals-groundwork

e52b281

resolve merge conflicts

2092563

valedan merged commit 0f4c5e9 into main Apr 3, 2023

valedan deleted the evals-groundwork branch April 3, 2023 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evals groundwork #134

Evals groundwork #134

valedan commented Mar 27, 2023

review-notebook-app bot commented Mar 27, 2023

rusheb left a comment

rusheb Mar 27, 2023

valedan Mar 28, 2023

valedan Mar 28, 2023

mivanit Mar 29, 2023 •

edited

Loading

luciaquirke Apr 1, 2023 •

edited

Loading

mivanit Apr 1, 2023

rusheb Mar 27, 2023

valedan Mar 28, 2023

rusheb Mar 27, 2023

valedan Mar 28, 2023

luciaquirke Mar 28, 2023

valedan Mar 28, 2023

luciaquirke Mar 28, 2023 •

edited

Loading

valedan Mar 28, 2023

valedan Mar 28, 2023

luciaquirke Mar 28, 2023

luciaquirke Mar 28, 2023

luciaquirke Mar 28, 2023 •

edited

Loading

valedan Mar 28, 2023

mivanit Mar 29, 2023

mivanit Apr 1, 2023 •

edited

Loading

rusheb left a comment

luciaquirke left a comment

valedan commented Apr 3, 2023

mivanit left a comment

rusheb commented Apr 9, 2023

	maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
	solution: MazePath, prediction: MazePath, /

	maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
	prediction: MazePath, /

	maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
	maze: LatticeMaze, prediction: MazePath, /

Evals groundwork #134

Evals groundwork #134

Conversation

valedan commented Mar 27, 2023

review-notebook-app bot commented Mar 27, 2023

rusheb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mivanit Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

luciaquirke Apr 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luciaquirke Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luciaquirke Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mivanit Apr 1, 2023 • edited Loading

Choose a reason for hiding this comment

rusheb left a comment

Choose a reason for hiding this comment

luciaquirke left a comment

Choose a reason for hiding this comment

valedan commented Apr 3, 2023

mivanit left a comment

Choose a reason for hiding this comment

rusheb commented Apr 9, 2023

mivanit Mar 29, 2023 •

edited

Loading

luciaquirke Apr 1, 2023 •

edited

Loading

luciaquirke Mar 28, 2023 •

edited

Loading

luciaquirke Mar 28, 2023 •

edited

Loading

mivanit Apr 1, 2023 •

edited

Loading