Issue while loading the trained checkpoint #55

NikitaGautam · 2023-01-31T14:24:33Z

Hi,

I am trying to test the trained model by loading the checkpoint, but it shows the following error:
Traceback (most recent call last):
File "test.py", line 119, in main
train(conf)
File "test.py", line 101, in train
pl_module = pl_module.load_from_checkpoint(checkpoint_path=conf.checkpoint_path,config=config, tokenizer = tokenizer, model = model)
File "/home//virtualenv/luke/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint
checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)
File "/usr/lib/python3.8/_collections_abc.py", line 832, in update
self[key] = other[key]
omegaconf.errors.ConfigKeyError: Key 'config' is not in struct
full_key: config
reference_type=Optional[Dict[Union[str, Enum], Any]]
object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I came across this issue: #47 but the answers did not help.
I tried converting the config to struct using Omegaconfig but it still does not work.

LittlePea13 · 2023-02-08T10:01:52Z

I am sorry about that, I think at some point there was a version incompatibility regarding the use of hydra/omegaconf with the checkpointing. Since you are already loading the module with the parameters, as long as you do not need to update them with the ones in the checkpoint, you can comment out the line

File "/home//virtualenv/luke/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint
checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)

And it should load it without issues. I know it's kind of an ugly hack but it's what I suggest until I find a proper fix if you need to reload your checkpoint.

NikitaGautam · 2023-02-08T17:52:27Z

Thanks for the quick fix. I tried several different things to reload the checkpoints but was not successful. If I find a solution, I will also post it here. For now, this quick fix works.

Andreas-Moller-Belsager · 2024-05-23T16:42:19Z

I have a similar issue
I have trained my model now, but when I try to run the script you showed on the front page (test.py model=rebel_model data=conll04_data train=conll04_train do_predict=True checkpoint_path="path_to_checkpoint"), it instead wants to load the path to the latest saved item, even if this is an error message. This means I cannot fetch the checkpoint from the path I define in the command.

Specifically, it does the following:

Fetches the path to the latest saved item in 'output'
Concatenates the path to the checkpoint specified in the command.

This means it tries to open a path that does not exist.

Do you know what is wrong

Example of this:

(I want to use the model saved at timestamp '2024-05-16/18-27-53', yet it wants to send me to something created at timestamp '2024-05-23/18-38-08')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue while loading the trained checkpoint #55

Issue while loading the trained checkpoint #55

NikitaGautam commented Jan 31, 2023

LittlePea13 commented Feb 8, 2023

NikitaGautam commented Feb 8, 2023

Andreas-Moller-Belsager commented May 23, 2024

Issue while loading the trained checkpoint #55

Issue while loading the trained checkpoint #55

Comments

NikitaGautam commented Jan 31, 2023

LittlePea13 commented Feb 8, 2023

NikitaGautam commented Feb 8, 2023

Andreas-Moller-Belsager commented May 23, 2024