-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evals groundwork #134
Evals groundwork #134
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes. Few comments, mostly nits
@classmethod | ||
def all_functions(cls) -> dict[str, PathEvalFunction]: | ||
excluded = ["all_functions"] | ||
|
||
return { | ||
**{ | ||
name: func | ||
for name, func in cls.__dict__.items() | ||
if not name.startswith("_") and name not in excluded | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is quite risky without proper type checking. Could lead to adding functions with the incorrect return type.
Not sure what to do about this, maybe we just need to prioritise #18 or maybe there is a more elegant solution involving interfaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is a fair concern. Enforcing types more strongly would help for sure.
If there's a more elegant solution I don't know what it is - I did think this code was a bit concerning when I moved it in here. But there is a valid need here for us to be able to easily get a list of all path eval functions. I didn't want to just add some constant listing them somewhere because then there would be 2 places to update any time we change something.
If there's a more Pythonic way of doing this I'm definitely open to it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if I could add some checks to this function to make sure everything being returned adheres to the correct interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the following checks might help:
isinstance(func, typing.Callable)
isinstance(func, types.FunctionType)
to check it's a static functiontyping.get_type_hints(func)["return"] is float
typing.get_type_hints(func)
can also help us check if the argument names & types are correct
a few other notes:
- we probably want to wrap the calls to these functions in a
try ... catch
since we dont want the whole training script to crash if an eval fails. definitely print warnings though, and have a way to make it crash during integration tests. maybe this is already being done? - is there a reason we are unwrapping the dict only to wrap it again? (maybe this is my fault haha)
- do we want to perform some type checks when loading the module, rather than waiting for
all_functions
to be called elsewhere? this should definitely throw an exception if it finds any functions which dont fit the criteria. - should we rename
all_functions
to_all_functions
, thereby excluding it automatically (or maybe move out of the class)? hopefully this discourages adding non-evaluation functions to this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Michael that the dictionary seems to be being unnecessarily unpacked and repacked. That this mistake slipped through is a sign that no one fully understands this implementation, which means we shouldn't use this implementation.
We should aim for code that is readable and maintainable rather than code which is clever. In this case I think you need to update the eval functions to remove unused parameters. Then you need to update all_functions to call the eval functions one by one (by name) and return their results in a dictionary. The method callsite will need to be updated.
Underscores in Python variables are used to indicate private methods so we shouldn't use it to name a public method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we shouldn't use underscores since it's a public method, and agree that we should aim for readable and maintainable over cleverness.
The best solution here is probably to have EXCLUDED_FUNCTIONS
as a class variable of the path evals class, rather than a private thing defined anew every time we run all_functions
. We should also move out the "is this a valid path_eval function" check into a separate function. I'll take a crack at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to have a TokenizedMaze class and add these as methods on that class. Not sure. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I like that idea. But it would be a huge change, definitely out of scope here. Everything that currently deals with tokens would need to be updated.
We've discussed before how we'll need to rethink what abstractions we're using for tokenization once we add additional tokenization schemes - maybe we can think about this then?
def get_path_tokens(tokens: list[str]) -> list[str]: | ||
"""The path is considered everything from the first path coord to the end of the list, including the path_end token (ie everything we are asking the model to predict)""" | ||
start_idx = tokens.index(SPECIAL_TOKENS["path_start"]) + 1 | ||
return tokens[start_idx:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you made end_value
param of tokens_between
optional, then it would generalise to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm torn here. I do see your point, and it would be elegant for everything in here to use tokens_between
. And if start_value
was also optional, it would generalize to get_tokens_up_to_path_start
too.
But I'm hesitant for a couple of reasons
- I don't want to add too much complexity to
tokens_between
- It's called tokens_between - that doesn't really fit with optional start or end values
get_path_tokens
andget_tokens_up_to_path_start
are fairly simple right now - the benefit of switching them to usetokens_between
doesn't seem that big.
Right now I'm leaning towards leaving this as is. What do you think?
Iterate over the segments of a path. | ||
""" | ||
i: int | ||
n_s: Coord | CoordTup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we rename n_s and n_e to be more meaningful? is this node south and node east 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha good call - I have no idea what these mean. I'll figure it out and rename them.
|
||
@staticmethod | ||
def node_overlap( | ||
maze: LatticeMaze, solution: MazePath, prediction: MazePath, / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are dependent on outcome of all_methods discussion
maze: LatticeMaze, solution: MazePath, prediction: MazePath, / | |
solution: MazePath, prediction: MazePath, / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah all these suggestions to remove unused args won't work unless we significantly change the approach here. The path evals all need to have the same interface because they're called in a for loop with the same inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a cool elegant Pythonic solution here I'm open to it 🙂
|
||
@staticmethod | ||
def num_connections_adjacent_lattice( | ||
maze: LatticeMaze, solution: MazePath, prediction: MazePath, / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maze: LatticeMaze, solution: MazePath, prediction: MazePath, / | |
prediction: MazePath, / |
|
||
@staticmethod | ||
def num_connections_adjacent( | ||
maze: LatticeMaze, solution: MazePath, prediction: MazePath, / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maze: LatticeMaze, solution: MazePath, prediction: MazePath, / | |
maze: LatticeMaze, prediction: MazePath, / |
) | ||
|
||
@classmethod | ||
def all_functions(cls) -> dict[str, PathEvalFunction]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somewhere I could read about using method dictionaries in python? I'm torn because I really like how this method is used in the notebooks, but sometimes complex code can end up being more trouble than it's worth when people struggle to modify it down the road. If there were good tests I would be less concerned. Will investigate further because I don't fully understand StatsCounter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly new to Python, so if you find anything on this I'd like to know too. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the python docs are relatively unhelpful with this afaik, I found these two helpful SO threads:
- https://stackoverflow.com/questions/48029249/python-dict
- https://stackoverflow.com/questions/19907442/explain-dict-attribute
the tl;dr is that it contains all attributes of an object, but all the built-in ones like __module__
, __doc__
, and __dict__
itself start with __
to signify that they are "magic" methods. Since we only inherit from object
, there shouldn't be anything other than a built-in in the class.
Starting with _
is convention in python for "this is a private method" or otherwise "don't use this unless you know what you're doing".
Tangent: the PEP style guide says never to invent double-underscore attributes and only use built-ins, but I may be guilty of doing this sometimes 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if I'm stepping on your toes Dan, I took care of this in bf407c5
39a5f7b
to
211cef0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing change request by commenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge once the all_functions method has been updated. Update is looking awesome, keen to get this in 🙏
Okay I've spent probably an unreasonable amount of time looking into this A few benefits here:
There is a little bit of complexity with some of the type-system wrangling I had to do here, but I think it may be unavoidable. @mivanit Thanks for taking a stab at this! I didn't end up using your approach because it was getting a bit too complex, and I don't think we need the Let me know if this is okay! Really hope I can merge today 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really clean solution! I like this a lot better than what we had before. The type checking I added was mostly just a stopgap until we actually have proper type checking working.
I'd have probably named evals
as PATH_EVALS
or something, but that's extremely minor.
Really good work on this!
Very cool!! |
This is some supporting work and refactoring that I've broken out of my big upcoming evals PR to make things a little more manageable.
Included here:
node_overlap
take numpy arrays. These are not unit tested but Test eval metrics #112 is tracking that.decode_maze_tokens_to_coords
there