Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evals groundwork #134

Merged
merged 15 commits into from
Apr 3, 2023
Merged

Evals groundwork #134

merged 15 commits into from
Apr 3, 2023

Conversation

valedan
Copy link
Contributor

@valedan valedan commented Mar 27, 2023

This is some supporting work and refactoring that I've broken out of my big upcoming evals PR to make things a little more manageable.

Included here:

  • Unifying the interface for eval functions (pathdist.py -> path_evals.py). Minimal functional changes here, just renaming some stuff and letting node_overlap take numpy arrays. These are not unit tested but Test eval metrics #112 is tracking that.
  • Move constants to seperate file
  • Add token utils and move decode_maze_tokens_to_coords there

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@valedan valedan requested review from rusheb and luciaquirke March 27, 2023 14:13
rusheb
rusheb previously requested changes Mar 27, 2023
Copy link
Collaborator

@rusheb rusheb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes. Few comments, mostly nits

Comment on lines 99 to 107
@classmethod
def all_functions(cls) -> dict[str, PathEvalFunction]:
excluded = ["all_functions"]

return {
**{
name: func
for name, func in cls.__dict__.items()
if not name.startswith("_") and name not in excluded
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is quite risky without proper type checking. Could lead to adding functions with the incorrect return type.

Not sure what to do about this, maybe we just need to prioritise #18 or maybe there is a more elegant solution involving interfaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is a fair concern. Enforcing types more strongly would help for sure.

If there's a more elegant solution I don't know what it is - I did think this code was a bit concerning when I moved it in here. But there is a valid need here for us to be able to easily get a list of all path eval functions. I didn't want to just add some constant listing them somewhere because then there would be 2 places to update any time we change something.

If there's a more Pythonic way of doing this I'm definitely open to it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if I could add some checks to this function to make sure everything being returned adheres to the correct interface.

Copy link
Member

@mivanit mivanit Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the following checks might help:

  • isinstance(func, typing.Callable)
  • isinstance(func, types.FunctionType) to check it's a static function
  • typing.get_type_hints(func)["return"] is float

typing.get_type_hints(func) can also help us check if the argument names & types are correct

a few other notes:

  • we probably want to wrap the calls to these functions in a try ... catch since we dont want the whole training script to crash if an eval fails. definitely print warnings though, and have a way to make it crash during integration tests. maybe this is already being done?
  • is there a reason we are unwrapping the dict only to wrap it again? (maybe this is my fault haha)
  • do we want to perform some type checks when loading the module, rather than waiting for all_functions to be called elsewhere? this should definitely throw an exception if it finds any functions which dont fit the criteria.
  • should we rename all_functions to _all_functions, thereby excluding it automatically (or maybe move out of the class)? hopefully this discourages adding non-evaluation functions to this class

Copy link
Contributor

@luciaquirke luciaquirke Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Michael that the dictionary seems to be being unnecessarily unpacked and repacked. That this mistake slipped through is a sign that no one fully understands this implementation, which means we shouldn't use this implementation.

We should aim for code that is readable and maintainable rather than code which is clever. In this case I think you need to update the eval functions to remove unused parameters. Then you need to update all_functions to call the eval functions one by one (by name) and return their results in a dictionary. The method callsite will need to be updated.

Underscores in Python variables are used to indicate private methods so we shouldn't use it to name a public method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we shouldn't use underscores since it's a public method, and agree that we should aim for readable and maintainable over cleverness.

The best solution here is probably to have EXCLUDED_FUNCTIONS as a class variable of the path evals class, rather than a private thing defined anew every time we run all_functions. We should also move out the "is this a valid path_eval function" check into a separate function. I'll take a crack at this.

tests/unit/maze_transformer/evaluation/test_eval_model.py Outdated Show resolved Hide resolved
maze_transformer/generation/constants.py Show resolved Hide resolved
tests/unit/maze_transformer/utils/test_token_utils.py Outdated Show resolved Hide resolved
maze_transformer/utils/token_utils.py Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to have a TokenizedMaze class and add these as methods on that class. Not sure. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I like that idea. But it would be a huge change, definitely out of scope here. Everything that currently deals with tokens would need to be updated.

We've discussed before how we'll need to rethink what abstractions we're using for tokenization once we add additional tokenization schemes - maybe we can think about this then?

maze_transformer/utils/token_utils.py Show resolved Hide resolved
Comment on lines +19 to +24
def get_path_tokens(tokens: list[str]) -> list[str]:
"""The path is considered everything from the first path coord to the end of the list, including the path_end token (ie everything we are asking the model to predict)"""
start_idx = tokens.index(SPECIAL_TOKENS["path_start"]) + 1
return tokens[start_idx:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you made end_value param of tokens_between optional, then it would generalise to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm torn here. I do see your point, and it would be elegant for everything in here to use tokens_between. And if start_value was also optional, it would generalize to get_tokens_up_to_path_start too.

But I'm hesitant for a couple of reasons

  • I don't want to add too much complexity to tokens_between
  • It's called tokens_between - that doesn't really fit with optional start or end values
  • get_path_tokens and get_tokens_up_to_path_start are fairly simple right now - the benefit of switching them to use tokens_between doesn't seem that big.

Right now I'm leaning towards leaving this as is. What do you think?

maze_transformer/utils/token_utils.py Outdated Show resolved Hide resolved
tests/unit/maze_transformer/utils/test_token_utils.py Outdated Show resolved Hide resolved
Iterate over the segments of a path.
"""
i: int
n_s: Coord | CoordTup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename n_s and n_e to be more meaningful? is this node south and node east 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha good call - I have no idea what these mean. I'll figure it out and rename them.


@staticmethod
def node_overlap(
maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
Copy link
Contributor

@luciaquirke luciaquirke Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are dependent on outcome of all_methods discussion

Suggested change
maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
solution: MazePath, prediction: MazePath, /

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah all these suggestions to remove unused args won't work unless we significantly change the approach here. The path evals all need to have the same interface because they're called in a for loop with the same inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a cool elegant Pythonic solution here I'm open to it 🙂


@staticmethod
def num_connections_adjacent_lattice(
maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
prediction: MazePath, /


@staticmethod
def num_connections_adjacent(
maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
maze: LatticeMaze, solution: MazePath, prediction: MazePath, /
maze: LatticeMaze, prediction: MazePath, /

)

@classmethod
def all_functions(cls) -> dict[str, PathEvalFunction]:
Copy link
Contributor

@luciaquirke luciaquirke Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere I could read about using method dictionaries in python? I'm torn because I really like how this method is used in the notebooks, but sometimes complex code can end up being more trouble than it's worth when people struggle to modify it down the road. If there were good tests I would be less concerned. Will investigate further because I don't fully understand StatsCounter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly new to Python, so if you find anything on this I'd like to know too. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the python docs are relatively unhelpful with this afaik, I found these two helpful SO threads:

the tl;dr is that it contains all attributes of an object, but all the built-in ones like __module__, __doc__, and __dict__ itself start with __ to signify that they are "magic" methods. Since we only inherit from object, there shouldn't be anything other than a built-in in the class.

Starting with _ is convention in python for "this is a private method" or otherwise "don't use this unless you know what you're doing".

Tangent: the PEP style guide says never to invent double-underscore attributes and only use built-ins, but I may be guilty of doing this sometimes 😢

Copy link
Member

@mivanit mivanit Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if I'm stepping on your toes Dan, I took care of this in bf407c5

Base automatically changed from solved-maze-2 to main March 28, 2023 12:55
Copy link
Collaborator

@rusheb rusheb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing change request by commenting

@valedan valedan requested a review from rusheb March 28, 2023 21:01
Copy link
Contributor

@luciaquirke luciaquirke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge once the all_functions method has been updated. Update is looking awesome, keen to get this in 🙏

@valedan
Copy link
Contributor Author

valedan commented Apr 3, 2023

Okay I've spent probably an unreasonable amount of time looking into this all_functions thing over the past day. I think I have something that I'm moderately happy with. I've removed all_functions completely and gone with a decorator that can be used to make arbitrary method dicts.

A few benefits here:

  • Evals are just a class property now, no need to call a method to get them.
  • This decorator is easily reusable for other cases where we need a method dict, for example logit-based evals.
  • I've removed unused args from all the eval functions, and added a **_ to allow unused kwargs so that the evaluate_model approach still works.

There is a little bit of complexity with some of the type-system wrangling I had to do here, but I think it may be unavoidable.

@mivanit Thanks for taking a stab at this! I didn't end up using your approach because it was getting a bit too complex, and I don't think we need the asserts because that should be taken care of by the function signatures (required params) and type system.

Let me know if this is okay! Really hope I can merge today 😄

Copy link
Member

@mivanit mivanit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really clean solution! I like this a lot better than what we had before. The type checking I added was mostly just a stopgap until we actually have proper type checking working.

I'd have probably named evals as PATH_EVALS or something, but that's extremely minor.

Really good work on this!

@valedan valedan merged commit 0f4c5e9 into main Apr 3, 2023
@valedan valedan deleted the evals-groundwork branch April 3, 2023 19:31
@rusheb
Copy link
Collaborator

rusheb commented Apr 9, 2023

gone with a decorator that can be used to make arbitrary method dicts.

Very cool!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants