After training, I load my best checkpoint and run trainer.test.
This fails with the following error in v 0.76.
Have people encountered this before? My unit tests, which don't call finetune.py through the command line, do not encounter this issue.
Thanks in advance! Happy to make a reproducible example if this is a new/unknown bug.
model = model.load_from_checkpoint(checkpoints[-1])
File "/home/shleifer/miniconda3/envs/nb/lib/python3.7/site-packages/pytorch_lightni
ng/core/lightning.py", line 1563, in load_from_checkpoint
checkpoint[CHECKPOINT_KEY_MODULE_ARGS].update(kwargs)
=>
KeyError: 'module_arguments'
model is a pl.Module checkpoints[-1] was saved by it, with the save_weights_only=True kwarg specified.
try use the last dev version 0.8
pip3 install --upgrade git+https://github.com/PyTorchLightning/pytorch-lightning.git
Tried that, get better traceback but no solution:
KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`.'
If I just want to run eval on a pl.Module should I avoid making a trainer?
My ckpt has 3 keys: ['state_dict, epoch, global_step]
When you only save the weights you need to instantiate the model with its parameters first and then load the state_dict with the weights into it
@sshleifer mind trying 0.8.0rc1?
this can be some back-compatibility issue as we moved the params there and back...
@sshleifer Mind draft a PR about the past key names for saved params?
shall be fixed in #2160 @sshleifer mind have a look?
still having issues when loading a checkpoint
When I manually examine the checkpoint saved by lightning it only contains following keys:
['epoch', 'global_step', 'pytorch-lightning_version', 'checkpoint_callback_best_model_score',
'checkpoint_callback_best_model_path', 'optimizer_states', 'lr_schedulers', 'state_dict']
so when I try using Module.load_from_checkpoint it fails because the parameters are not present.
OmegaConf is used to instantiate the module like this: lm = Module(**config.lightning_module_conf)
pytorch_lightning version 0.9.0
Still an issue at version 1.0.4, module_arguments are not present in the checkpoint.
@FluidSense did you call self.save_hyperparameters() in LightningModule's __init__?
@wolterlw No, you're right! That was the issue. I discovered the pull requests regarding updating the docs regarding checkpoint saving last night, and there I discovered that I was missing that line.
Most helpful comment
still having issues when loading a checkpoint
When I manually examine the checkpoint saved by lightning it only contains following keys:
['epoch', 'global_step', 'pytorch-lightning_version', 'checkpoint_callback_best_model_score',
'checkpoint_callback_best_model_path', 'optimizer_states', 'lr_schedulers', 'state_dict']
so when I try using Module.load_from_checkpoint it fails because the parameters are not present.
OmegaConf is used to instantiate the module like this: lm = Module(**config.lightning_module_conf)
pytorch_lightning version 0.9.0