Ignite: How to save model with same filename?

Created on 30 Oct 2020  路  9Comments  路  Source: pytorch/ignite

馃殌 Feature






Hi @vfdev-5 ,

I found that CheckpointHandler can't use the fixed filename to save latest model:
https://github.com/pytorch/ignite/blob/v0.4.2/ignite/handlers/checkpoint.py#L149
But this is a very important requirement on our side...How can I achieve that?
For example, I want to always save the best model as model.pt after validations during training.

Thanks.

enhancement help wanted question

All 9 comments

Hi @Nic-Ma

Thanks for asking ! Yes, there is a limitation on the filename like that. Sometime ago I was thinking about supporting this feature. Maybe, it's time to add a support for that inside Checkpoint.
Let me see what we can do.

Hi @vfdev-5 ,

We need this feature urgent in NVIDIA project, so I developed a PR in MONAI by somehow hack your Checkpoint handler to manually delete the previously saved model:
https://github.com/Project-MONAI/MONAI/pull/1163
Could you please help review it?
If you can provide a simpler way than mine, that's would be awesome.
BTW, I already hacked several places to support some features, maybe we can think about releasing a special branch of ignite or some better idea?
Or can we add some my "reasonable" features to ignite 0.4.3 directly? When do you plan to release it?

Thanks.

@Nic-Ma actually, I forgot that there is simple workaround to do that using a custom disk saver:

import torch
from ignite.engine import Engine, State
from ignite.handlers import Checkpoint
from ignite.handlers.checkpoint import BaseSaveHandler


class MyDiskSaver(BaseSaveHandler):

    def __call__(self, checkpoint, filename, metadata):
        print("-> save checkpoint:", filename, metadata)

    def remove(self, filename):
        print("remove:", filename)


model = torch.nn.Linear(1, 1)

save_handler = MyDiskSaver()

to_save = {"model": model}
checkpointer = Checkpoint(to_save, save_handler=save_handler, n_saved=1, score_function=lambda e: e.state.score)

trainer = Engine(lambda e, b: None)
trainer.state = State(epoch=1, iteration=1, score=0.77)

checkpointer(trainer)

trainer.state.epoch = 12
trainer.state.iteration = 1234
trainer.state.score = 0.78

checkpointer(trainer)
> -> save checkpoint: model_0.7700.pt {'basename': 'model', 'score_name': None, 'priority': 0.77}
> remove: model_0.7700.pt
> -> save checkpoint: model_0.7800.pt {'basename': 'model', 'score_name': None, 'priority': 0.78}

In MyDiskSaver you can save the file as you wish with any name you wish and keep remove implementation empty.
The only thing which may seem incoherent is

checkpointer.last_checkpoint
> 'model_0.7800.pt'

instead of the real filename...

What do you think ?

Hi @vfdev-5 ,

Thanks for your sugggestion, it's simpler than my previous idea, I updated my PR: https://github.com/Project-MONAI/MONAI/pull/1163
Could you please help review it again?

Thanks.

Hi @vfdev-5 ,

BTW, there is another problem with the Checkpoint handler: if we normally call engine.run() several times, every time it will run from the beginning because it has _is_done() to reset all the state, but Checkpoint handler doesn't reset the self._saved in it, right?

Thanks.

@Nic-Ma yes, you are right about that. Instance of Checkpoint keeps its _saved which can lead to unsaved checkpoints on further runs... Could you please open a feature request for that ? Thanks !

Hi @vfdev-5 , sure, submitted ticket: https://github.com/pytorch/ignite/issues/1422
Can we keep this issue open to track? Until you guys officially support fixed file name?

Thanks.

Hi @Nic-Ma , #1423 should enable support for fixed filenames. It will be available in the next stable release and now is available on nightly.

Cool, sounds good a plan!
Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kilsenp picture kilsenp  路  3Comments

vfdev-5 picture vfdev-5  路  3Comments

samarth-robo picture samarth-robo  路  3Comments

jphdotam picture jphdotam  路  4Comments

czotti picture czotti  路  3Comments