Pytorch-lightning: load_from_checkpoint: checkpoint[ 'module_arguments'] KeyError

Created on 8 Jun 2020 · 10Comments · Source: PyTorchLightning/pytorch-lightning

After training, I load my best checkpoint and run trainer.test.
This fails with the following error in v 0.76.
Have people encountered this before? My unit tests, which don't call finetune.py through the command line, do not encounter this issue.

Thanks in advance! Happy to make a reproducible example if this is a new/unknown bug.

    model = model.load_from_checkpoint(checkpoints[-1])
  File "/home/shleifer/miniconda3/envs/nb/lib/python3.7/site-packages/pytorch_lightni
ng/core/lightning.py", line 1563, in load_from_checkpoint
    checkpoint[CHECKPOINT_KEY_MODULE_ARGS].update(kwargs)

KeyError: 'module_arguments'

model is a pl.Module checkpoints[-1] was saved by it, with the save_weights_only=True kwarg specified.

Priority P0 bug / fix help wanted

Source

sshleifer

Most helpful comment

still having issues when loading a checkpoint
When I manually examine the checkpoint saved by lightning it only contains following keys:

['epoch', 'global_step', 'pytorch-lightning_version', 'checkpoint_callback_best_model_score',
'checkpoint_callback_best_model_path', 'optimizer_states', 'lr_schedulers', 'state_dict']

so when I try using Module.load_from_checkpoint it fails because the parameters are not present.
OmegaConf is used to instantiate the module like this: lm = Module(**config.lightning_module_conf)

pytorch_lightning version 0.9.0

wolterlw on 23 Sep 2020

👍2

All 10 comments

try use the last dev version 0.8

pip3 install --upgrade git+https://github.com/PyTorchLightning/pytorch-lightning.git

pvnieo on 8 Jun 2020

Tried that, get better traceback but no solution:

KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`.'

If I just want to run eval on a pl.Module should I avoid making a trainer?
My ckpt has 3 keys: ['state_dict, epoch, global_step]

sshleifer on 8 Jun 2020

When you only save the weights you need to instantiate the model with its parameters first and then load the state_dict with the weights into it

HansBambel on 9 Jun 2020

@sshleifer mind trying 0.8.0rc1?

williamFalcon on 9 Jun 2020

this can be some back-compatibility issue as we moved the params there and back...
@sshleifer Mind draft a PR about the past key names for saved params?

Borda on 11 Jun 2020

shall be fixed in #2160 @sshleifer mind have a look?

Borda on 12 Jun 2020

still having issues when loading a checkpoint
When I manually examine the checkpoint saved by lightning it only contains following keys:

['epoch', 'global_step', 'pytorch-lightning_version', 'checkpoint_callback_best_model_score',
'checkpoint_callback_best_model_path', 'optimizer_states', 'lr_schedulers', 'state_dict']

so when I try using Module.load_from_checkpoint it fails because the parameters are not present.
OmegaConf is used to instantiate the module like this: lm = Module(**config.lightning_module_conf)

pytorch_lightning version 0.9.0

wolterlw on 23 Sep 2020

👍2

Still an issue at version 1.0.4, module_arguments are not present in the checkpoint.

FluidSense on 29 Oct 2020

@FluidSense did you call self.save_hyperparameters() in LightningModule's __init__?

wolterlw on 30 Oct 2020

🎉1

@wolterlw No, you're right! That was the issue. I discovered the pull requests regarding updating the docs regarding checkpoint saving last night, and there I discovered that I was missing that line.

FluidSense on 30 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings