I understand that pl handles cli ctrl+c with graceful degradation. I also see the ability to save checkpoints which is triggered by metrics queries. Is there is an idiomatic way to have pl save a checkpoint when I ctrl+c?
Hi! thanks for your contribution!, great first issue!
we should add a callback, on_ctrl_c or something
no need, it is already the on on_interupr
@nkyriazis mind send a PR with saving in on_interupr
Hi I am a new contributor. I would like to work on this issue if no one else is working on it. I could not locate the on_interupr utility on the master branch. Am I missing something?
go for it!
why not just support ctrl+s? isn’t this a more user friendly approach haha?
or do you also mean save a checkpoint on interrupt?
Saving a checkpoint on ctrl+s is also not a bad idea.
Terminate program without saving checkpoint : ctrl+c
Terminate program and save latest checkpoint : ctrl+s
Does this sound good?
i was thinking more like ctrl+s saves a checkpoint at that time.
then separately if you want to finish training ctrl+c.
so, the only new change is to add a checkpoint save call on ctrl+s
Wouldn't this be an issue if the program is in middle of a training epoch. Generally we would like to save checkpoints at the end of an epoch right?
@williamFalcon I'm not sure this is possible
actually I don't think so, because python does not raise an Exception on ctrl+s. So there's nothing we can actually hook into, since all inputs, that are not caught by pythons io or raise an exception are postponed until the program is finished afaik.
So probably we need (and imo also should) go with ctrl+c
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Most helpful comment
@williamFalcon I'm not sure this is possible
actually I don't think so, because python does not raise an Exception on ctrl+s. So there's nothing we can actually hook into, since all inputs, that are not caught by pythons io or raise an exception are postponed until the program is finished afaik.
So probably we need (and imo also should) go with ctrl+c