Currently Pytorch lightning uses the latest version of the model for testing. In research, we want to first load the best checkpoint and do the testing from there. Also it would be good to restart from the best checkpoint after learning rate plateau as an option.
We want the best model for training/testing. Also for NLP it is more natural to go to the best checkpoint and restart with decayed learning rate.
Manual loading and checking of the validation value (which is against lightning principal).
Hi! thanks for your contribution!, great first issue!
this makes sense. how do you suggest to do it? ideally you do this in lightning
model = Model.load_from_checkpoint(PATH)
trainer.test(model)
why doesn鈥檛 this fit your use case?
There are 2 issues here:
At the beginning I think it is as easy as loading the checkpoint internally (during training loop). But I realized that in multi GPUs setting, these parameters need to be copied to each pytorch module inside every GPU.
In summary:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This will need to have the specific checkpoint path right?
Is there a way to give it the dir path, and it will load the best ckpt based on CheckpointModel obejct logic?
this makes sense. how do you suggest to do it? ideally you do this in lightning
model = Model.load_from_checkpoint(PATH)
trainer.test(model)why doesn鈥檛 this fit your use case?
This will need to have the specific checkpoint path right?
Is there a way to give it the dir path, and it will load the best ckpt based on CheckpointModel obejct logic?
yes the path is in the Checkoint
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
already solved... 馃惏
Most helpful comment
There are 2 issues here:
At the beginning I think it is as easy as loading the checkpoint internally (during training loop). But I realized that in multi GPUs setting, these parameters need to be copied to each pytorch module inside every GPU.
In summary: