Hi @vfdev-5 ,
I found that CheckpointHandler can't use the fixed filename to save latest model:
https://github.com/pytorch/ignite/blob/v0.4.2/ignite/handlers/checkpoint.py#L149
But this is a very important requirement on our side...How can I achieve that?
For example, I want to always save the best model as model.pt after validations during training.
Thanks.
Hi @Nic-Ma
Thanks for asking ! Yes, there is a limitation on the filename like that. Sometime ago I was thinking about supporting this feature. Maybe, it's time to add a support for that inside Checkpoint.
Let me see what we can do.
Hi @vfdev-5 ,
We need this feature urgent in NVIDIA project, so I developed a PR in MONAI by somehow hack your Checkpoint handler to manually delete the previously saved model:
https://github.com/Project-MONAI/MONAI/pull/1163
Could you please help review it?
If you can provide a simpler way than mine, that's would be awesome.
BTW, I already hacked several places to support some features, maybe we can think about releasing a special branch of ignite or some better idea?
Or can we add some my "reasonable" features to ignite 0.4.3 directly? When do you plan to release it?
Thanks.
@Nic-Ma actually, I forgot that there is simple workaround to do that using a custom disk saver:
import torch
from ignite.engine import Engine, State
from ignite.handlers import Checkpoint
from ignite.handlers.checkpoint import BaseSaveHandler
class MyDiskSaver(BaseSaveHandler):
def __call__(self, checkpoint, filename, metadata):
print("-> save checkpoint:", filename, metadata)
def remove(self, filename):
print("remove:", filename)
model = torch.nn.Linear(1, 1)
save_handler = MyDiskSaver()
to_save = {"model": model}
checkpointer = Checkpoint(to_save, save_handler=save_handler, n_saved=1, score_function=lambda e: e.state.score)
trainer = Engine(lambda e, b: None)
trainer.state = State(epoch=1, iteration=1, score=0.77)
checkpointer(trainer)
trainer.state.epoch = 12
trainer.state.iteration = 1234
trainer.state.score = 0.78
checkpointer(trainer)
> -> save checkpoint: model_0.7700.pt {'basename': 'model', 'score_name': None, 'priority': 0.77}
> remove: model_0.7700.pt
> -> save checkpoint: model_0.7800.pt {'basename': 'model', 'score_name': None, 'priority': 0.78}
In MyDiskSaver you can save the file as you wish with any name you wish and keep remove implementation empty.
The only thing which may seem incoherent is
checkpointer.last_checkpoint
> 'model_0.7800.pt'
instead of the real filename...
What do you think ?
Hi @vfdev-5 ,
Thanks for your sugggestion, it's simpler than my previous idea, I updated my PR: https://github.com/Project-MONAI/MONAI/pull/1163
Could you please help review it again?
Thanks.
Hi @vfdev-5 ,
BTW, there is another problem with the Checkpoint handler: if we normally call engine.run() several times, every time it will run from the beginning because it has _is_done() to reset all the state, but Checkpoint handler doesn't reset the self._saved in it, right?
Thanks.
@Nic-Ma yes, you are right about that. Instance of Checkpoint keeps its _saved which can lead to unsaved checkpoints on further runs... Could you please open a feature request for that ? Thanks !
Hi @vfdev-5 , sure, submitted ticket: https://github.com/pytorch/ignite/issues/1422
Can we keep this issue open to track? Until you guys officially support fixed file name?
Thanks.
Hi @Nic-Ma , #1423 should enable support for fixed filenames. It will be available in the next stable release and now is available on nightly.
Cool, sounds good a plan!
Thanks.