Pytorch-lightning: Incorrect "Saving latest checkpoint" warning

Created on 21 Sep 2020  路  5Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

"Saving latest checkpoint..." warning appears regardless of whether a ModelCheckpoint exists or save_last is set to True

https://github.com/PyTorchLightning/pytorch-lightning/blob/a71d62d8409f4960a4b438b8d19c924d3636c73f/pytorch_lightning/trainer/training_loop.py#L167-L169

https://github.com/PyTorchLightning/pytorch-lightning/blob/a71d62d8409f4960a4b438b8d19c924d3636c73f/pytorch_lightning/trainer/training_loop.py#L196-L204

This might confuse an user to think the last checkpoint got saved when it did not.

Proposed change:

def check_checkpoint_callback(self, should_check_val, force_save=False):
    model = self.trainer.get_model()

    # when no val loop is present or fast-dev-run still need to call checkpoints
    # TODO bake this logic into the checkpoint callback
    should_activate = not is_overridden('validation_step', model) and not should_check_val
    if should_activate or force_save:
        checkpoint_callbacks = [c for c in self.trainer.callbacks if isinstance(c, ModelCheckpoint)]
        if any(c.save_last for c in checkpoint_callbacks):
            rank_zero_warn('Saving latest checkpoint..')
        [c.on_validation_end(self.trainer, model) for c in checkpoint_callbacks]
Checkpoint bug / fix help wanted

All 5 comments

why not just remove the log line from training_loop and defer logging about saving the latest checkpoint to be within the checkpoint callback? that seems simpler to me

Because the logic to save last is inside of on_validation_end so it would appear after the first validation run

save_last is set to True

it was meant to save the checkpoint if someone interrupts the training.

regardless of whether a ModelCheckpoint exists

yea, should not log if no ModelCheckpoint is used.

Thanks for the issue @carmocca . Mind sending a PR?

Done!

Was this page helpful?
0 / 5 - 0 ratings