Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while loading the trained checkpoint #55

Open
NikitaGautam opened this issue Jan 31, 2023 · 3 comments
Open

Issue while loading the trained checkpoint #55

NikitaGautam opened this issue Jan 31, 2023 · 3 comments

Comments

@NikitaGautam
Copy link

Hi,

I am trying to test the trained model by loading the checkpoint, but it shows the following error:
Traceback (most recent call last):
File "test.py", line 119, in main
train(conf)
File "test.py", line 101, in train
pl_module = pl_module.load_from_checkpoint(checkpoint_path=conf.checkpoint_path,config=config, tokenizer = tokenizer, model = model)
File "/home//virtualenv/luke/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint
checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)
File "/usr/lib/python3.8/_collections_abc.py", line 832, in update
self[key] = other[key]
omegaconf.errors.ConfigKeyError: Key 'config' is not in struct
full_key: config
reference_type=Optional[Dict[Union[str, Enum], Any]]
object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I came across this issue: #47 but the answers did not help.
I tried converting the config to struct using Omegaconfig but it still does not work.

@LittlePea13
Copy link
Collaborator

I am sorry about that, I think at some point there was a version incompatibility regarding the use of hydra/omegaconf with the checkpointing. Since you are already loading the module with the parameters, as long as you do not need to update them with the ones in the checkpoint, you can comment out the line

File "/home//virtualenv/luke/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint
checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)

And it should load it without issues. I know it's kind of an ugly hack but it's what I suggest until I find a proper fix if you need to reload your checkpoint.

@NikitaGautam
Copy link
Author

Thanks for the quick fix. I tried several different things to reload the checkpoints but was not successful. If I find a solution, I will also post it here. For now, this quick fix works.

@Andreas-Moller-Belsager

I have a similar issue
I have trained my model now, but when I try to run the script you showed on the front page (test.py model=rebel_model data=conll04_data train=conll04_train do_predict=True checkpoint_path="path_to_checkpoint"), it instead wants to load the path to the latest saved item, even if this is an error message. This means I cannot fetch the checkpoint from the path I define in the command.

Specifically, it does the following:

  1. Fetches the path to the latest saved item in 'output'
  2. Concatenates the path to the checkpoint specified in the command.

This means it tries to open a path that does not exist.

Do you know what is wrong

Example of this:

image

(I want to use the model saved at timestamp '2024-05-16/18-27-53', yet it wants to send me to something created at timestamp '2024-05-23/18-38-08')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants