Pytorch-lightning: On the relationship between Result and Callback monitor

Created on 31 Aug 2020  Â·  22Comments  Â·  Source: PyTorchLightning/pytorch-lightning

💬 Discussion

Since I started using Lightning, I have noticed many users, as well as myself, having trouble with the way metrics are saved for callbacks by the Result class.

The current design of having only {early_stop_on,checkpoint_on} forces different callbacks to use the same monitor. I believe that wanting different monitors for EarlyStopping, ModelCheckpoint, and LearningRateLogger is a very common use case. This is something we will also need to address if we ever want to support multiple ModelCheckpoint callbacks.

This is also unintuitive for new users since most callbacks include a monitor parameter which becomes completely useless when the Result object is used. On the other hand, using a Result object is the new recommended way to structure {training,validation,test}_step().

Additionally, the approach is not future-proof since we would need to update lightning to include, say, batchnorm_on or whatever new technique that becomes widespread.

I believe this is an important issue that needs a solution.

Requirements:

We could use this issue to have a productive discussion about how the final design should look and what are the necessary requirements to fit most use cases:

To start, I would say:

  1. Have it be easy to understand: avoid the conflict between monitor in Callbacks and {early_stop_on,checkpoint_on} in Result
  2. Allow any number of different monitors to be used simultaneously
  3. Allow a callback monitor to be changed on-the-run

Own proposal

  1. Remove {early_stop_on,checkpoint_on}
  2. Add callback to result.log() function (or add result.callback() - both would have the same functionality) for users to save metrics on-the-run
train_result.log(
    "train_loss", train_loss,
    on_step=False, on_epoch=True,
    prog_bar=True, callback=True
)
...
...
# current behaviour would override `train_result`s checkpoint_on
# if it is set also in `eval_result`. This causes some limitations.
# `callback=True` here should not override `train_loss`
# in `train_result` but keep both.
eval_result.log(
    "valid_loss", valid_loss,
    on_step=False, on_epoch=True,
    prog_bar=True, callback=True
)
# note: prog_bar should also be removed from `.log()`
# and the ProgressBar class should instead have a `--monitor` parameter.
  1. Have each callback set its monitor via --monitor
  2. Set --monitor type to be Optional[Union[str, Callable[..., str]] so the following is possible:
# basic behaviour
pl.callbacks.ModelCheckpoint(
    monitor="valid_accuracy",
    mode="max"
)

# Can change monitor on-the-run
# Note: maybe trainer should be injected into the callback monitor function
pl.callbacks.EarlyStopping(
    monitor=lambda: "train_loss" if trainer.current_epoch <= 20 else "val_loss",
    mode="min"
)

# Could even have
complex_monitor = VeryComplexClassWhichDoesSomethingVeryFancyWithState()
assert callable(complex_monitor)
pl.callbacks.LearningRateLogger(monitor=complex_monitor)
# this would cover a big chunk of usecases

I am positive you guys have other great ideas.


cc: @williamFalcon @rohitgr7

2976 (previous discussion)

3243 (duplicate)

2908 (related, would require these improvements)

3254 (related, could use any callback_metric)

3291

https://forums.pytorchlightning.ai/t/does-evalresults-also-work-with-early-stopping

Probably missing other related issues. Feel free to tag

API / design Priority P0 bug / fix discussion help wanted

Most helpful comment

ok guys, tracking this. sorry for the delay. focused on refactors today but will propose something tomorrow am (NY time). A lot of good ideas here that i need to think about overnight :)

All 22 comments

I wonder how the monitor can be changed in ModelCheckpoint and EarlyStopping callback since both of them have states like best_k_scores or best_score or more.
Yeah, checkpoint_on and early_stop_on is a bit tricky I guess, I am still trying to understand how they work with ModelCheckpoint and EarlyStopping. Not sure if it's completely integrated with these callbacks yet.

I wonder how the monitor can be changed in ModelCheckpoint and EarlyStopping callback since both of them have states like best_k_scores or best_score or more.

How it currently works, these callbacks cannot know when checkpoint_on and early_stop_on input metric has changed.

After my proposed changes, we could choose between:

  • keep overwritting the callback state, regardless of monitor (current behaviour).
  • keep a dict of the state for each --monitor used.
  • reset the state whenever a new --monitor key is detected.

keep a dict of the state for each --monitor used.
reset the state whenever a new --monitor key is detected.

sounds like a new ModelCheckpoint or EarlyStopping every time monitor is changed since we have to set mode accordingly. A dict of ModelCheckpoints maybe where keys can be the monitor., and trigger the checkpoint according to the key marked as checkpoint or callback(as @carmocca suggested). Although callback seems to be better choice here IMO since it will avoid different params for different callbacks in Result obj.

May I also suggest to keep the checkpoint strategies separate from the model? I think what @carmocca is suggesting goes towards that direction. I think we should compute metrics within the model, but we should let the Trainer() take care of the checkpoint callbacks. By doing so, I could share my model and let _you_ decide how to tweak the training procedure.

@williamFalcon thoughts?

In my opinion, would it be better to use Result for everything else (logging, reduce_fx, ...) except for checkpoint_on and early_stop_on?

And we would use the callbacks as before with the Trainer with the keys logged from Result.
For using different monitor, could we allow list of respective callbacks (multiple checkpoint callbacks, early stopping callbacks, ...) to pass to the Trainer?

ok guys, tracking this. sorry for the delay. focused on refactors today but will propose something tomorrow am (NY time). A lot of good ideas here that i need to think about overnight :)

Adding a use case here which hopefully fits into the discussion. It was clear how to achieve it in v0.8.4, now it's not anymore:

I want to use the validation accuracy on the whole validation set for checkpointing. Since I have a different number of samples in each batch, I can't just compute the accuracy in each step and average on epoch end. So I compute the number of correct and total samples in each step, log both to eval_result under the key _val_acc_ and wrote a custom reduce function.

As mentioned in one of the comments above, the monitor key of the checkpoint callback is now ignored. At the same time, I don't understand how to use checkpoint_on as an alternative to achieve checkpointing on the aggregated validation accuracy.

Hi all
I've just started trying lightning. Been long time Chainer user, where btw these extensions are pretty modular in a sense that everything is in one log, you choose which value to monitor / what trigger to use /which extension to call on trigger.
I guess my issue #3338 is a bit of a testimony to things being confusing in that "monitor" not working by itself.
I would guess the motivation behind checkpoint_on was to make it automated without explicit callback :-\
Anyways, the bottom line is that I'd like to support the idea that using results monitor for callbacks is way more intuitive :)

thanks for the patience! almost done with the first pass at refactors, so i will likely tackle this over the weekend.

Here's what i'm thinking:

  1. remove early_stop_on, checkpoint_on
  2. enable anything you log to be available as a metric for callbacks.
  3. this means you will need to use monitor in a callback to watch what you want...

result = EvalResult()
result.log('giraffe', acc)

# then you need to monitor giraffe
es= EarlyStopping(monitor='giraffe')

When you log at both step and epoch, your metrics adds 'step_' or '_epoch' (this is true today).


result.log('giraffe', x, on_step=True, on_epoch=True)

# results in 'step_giraffe', 'epoch_giraffe'

Since without that, the value is ambiguous.... Now, if you want to monitor this metric you can pick which one you want:

es= EarlyStopping(monitor='step_giraffe')
es= EarlyStopping(monitor='epoch_giraffe')

How does this sound to everyone?
Does this not cover a use case that's been mentioned?

this won't allow changing the value you want to monitor on, right? I thought early_stop_on and checkpoint_on was added to add this functionality.

? yeah, you can still change it... you just have to specify it now in the callback.
The problem is that apparently this is causing a ton of confusion.

I personally think it's cleaner to set early_stop_on and checkpoint_on... this proposed new approach seems a bit more annoying to me but it's something i think everyone is asking for?

It still can have early_stop_on and checkpoint_on but I think adding another flag callback=True/False in result.log() is a good suggestion here to allow other metrics to be added in callback_metrics.

Also, I think EarlyStopping and ModelCheckpoint are not completely compatible with the new Result object, right?? For eg, even if I change the early_stop_on or checkpoint_on at some point during training, let's say from loss to accuracy, these callbacks defined in the Trainer have some state variables like mode and patience which I don't think will work in this case. Also even if the user wants to use checkpoint_on, to update the checkpoint_filename accordingly, one has to log it too with logger=True.

If you go with this new flag callback then I suggest changing result.log to something else, maybe result.add?

A healthy discussion is required here to cover-up all the possible use-cases.

I personally think it's cleaner to set early_stop_on and checkpoint_on... this proposed new approach seems a bit more annoying to me but it's something i think everyone is asking for?

Mind explaining why do you find the proposed approach more annoying? It uses the already existing parameter monitor and would avoid having multiple *_on parameters in the Result (instead adding the callback parameter to it).
Right now there is only two (checkpoint_on and early_stop_on) but with time, there might be more which could clutter lightnings code.

maybe result.add?

what about result.register?

because not everyone has a "val_loss" keyword...
and then it somehow makes val_loss "magical"... but in reality, i want to early stop or checkpoint on some other thing like accuracy and it's annoying to have to init the callback and set the monitor there.

because if i change my mind tomorrow, now i have to remember to change it in multiple places... and it also means that the trainer is no longer model agnostic... instead i have to if statement for every model i want to train to pick a different monitor value.

it is very inconvenient lol.

Example:

Without keyword:

# model A
result.log('X', x)

# model B
result.log('Y', y)

if model A:
  es = EarlyStopping(monitor='X')
else:
  es = EarlyStopping(monitor='Y')

With keyword:

# model A
EvalResult(early_stop_on=X)

# model B
EvalResult(early_stop_on=Y)

Example 2...

What if i want to change the key for early stopping during training?

if epoch % 2 == 0:
  EvalResult(checkpoint_on=X)
else:
  EvalResult(checkpoint_on=Y)

But without the keyword?? there's no way to do this...

What if you don't know the checkpoint/early_stop metric in the LightningModule? you are forced to do this:

# user_script.py
module = MyLightningModule(monitor=args.monitor)

# my_lightning_module.py
def training_step(...):
    ...
    if self.monitor == "train_loss"
        result = pl.TrainResult(minimize=train_loss, early_stop_on=train_loss)
    elif self.monitor == "train_acc":
        result = pl.TrainResult(minimize=train_loss, early_stop_on=train_acc)
    elif:
        ...
    else:
        result = pl.TrainResult(minimize=train_loss)
    result.log("train_loss", train_loss)
    result.log("train_acc", train_acc)
    ...

def validation_step(...):
    ....
    if self.monitor == "val_loss":
        result = pl.EvalResult(early_stop_on=val_loss)
    elif self.monitor == "val_acc":
        result = pl.EvalResult(early_stop_on=val_acc)
    elif:
        ...
    else:
        result = pl.EvalResult()
    result.log("val_loss", val_loss)
    result.log("val_acc", val_acc)
    ...

The number of if statements would get much worse if I wanted to separate self.monitor into self.early_stop_monitor and self.checkpoint_monitor.

Alternatively:

# user_script.py
module = LightningModule()
es = EarlyStopping(monitor=args.monitor)

# my_lightning_module.py
def training_step(...):
    ...
    result = pl.TrainResult(minimize=train_loss)
    result.log("train_loss", train_loss, callback=True)
    result.log("train_acc", train_acc, callback=True)
    ...

def validation_step(...):
    ...
    result = pl.EvalResult()
    result.log("val_loss", val_loss, callback=True)
    result.log("val_acc", val_acc, callback=True)
    ...

it's annoying to have to init the callback and set the monitor there

But I believe this is the most common case. You often want to change the monitor without having to modify the module code. It is also how other libraries (e.g. Keras) do it afaik

What if i want to change the key for early stopping during training?

This is why I proposed letting monitor be a Callable, see original issue comment

but this is exactly why the results makes more sense... because you CAN change it while training. This is an exact example of that.

Otherwise you'd have to set the monitor at the callback level and that's now disconnected from the model...

ie: your model can't be moved around and is not modular now because it has this other external dependency where you must also remember to set X value in the callback.

Currently with results you can say:
"import my model and use your own trainer"

with your approach you'd say
"import my model and use your trainer. And oh by the way, don't forget to init this callback and set this magic word so it'll save weights"

it's not very modular...

This is redundant...

    result.log("train_loss", train_loss, callback=True)

we can just make anything that is logged available as a callback (ie: always true) but you don't need to add another keyword

Currently with results you can say:
"import my model and use your own trainer"

with your approach you'd say
"import my model and use your trainer. And oh by the way, don't forget to init this callback and set this magic word so it'll save weights"

If the issue is about providing callbacks with the module, why not add configure_callbacks function to LightningModule?

we can just make anything that is logged available as a callback (ie: always true)

Sure, I don't mind that :smiley:

Example

class MyLightningModule(pl.core.LightningModule):
    def __init__(
        self,
        early_stop_monitor,
        checkpoint_monitor,
    ):
        super().__init__()
        self.early_stop_monitor = early_stop_monitor
        self.checkpoint_monitor = checkpoint_monitor

    def configure_callbacks(self):
        callbacks = []
        if self.early_stop_monitor:
            callbacks.append(EarlyStopping(monitor=self.early_stop_monitor))
        if self.checkpoint_monitor:
            # We have a reference to the trainer here so the following is possible
            # if we choose to allow Callables as callback monitors:
            monitor = lambda: self.checkpoint_monitor if self.trainer.current_epoch % 2 == 0 else "val_loss",
            callbacks.append(ModelCheckpoint(monitor=monitor))
        return callbacks

    def training_step(self, ...):
        ...
        result = pl.TrainResult(minimize=train_loss)
        result.log("train_loss", train_loss)
        result.log("train_acc", train_acc)

    def validation_step(self, ...):
        ...
        result = pl.EvalResult()
        result.log("val_loss", val_loss)
        result.log("val_acc", val_acc)

right, i think that's what i was originally hoping for:

  1. use early stopping and checkpoint as keywords
  2. allow anything that is logged to be available to be monitored.

but that means we have 2 ways of doing something. so it's why i then thought to just drop the keywords (ie: not do 1 and only 2)

I love this callbacks hook idea... let's bring it up for vote from the community? (nice suggestion @carmocca!)

What do we think?

My take is that some models to philosophically NEED certain callbacks. And to keep models self-contained we need to make sure that the required callbacks are packaged as well.

i’ve seen this myself in self supervised as well… for instance:

  • BYOL weight update was done as a callback
  • simclr and co require finetune MLPs that i don’t want in the main code

but it doesn’t make sense for people to run these models without those callbacks. so now we have this disconnect between stand-alone modules and training

I would have both model.callbacks and trainer.callbacks. As you mentioned, some callbacks are inherently related to the model function and others to the training procedure (i.e. checkpointing). The first come packaged with the LightningModule and the second can be set by whoever is training the model.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chuong98 picture chuong98  Â·  3Comments

Vichoko picture Vichoko  Â·  3Comments

anthonytec2 picture anthonytec2  Â·  3Comments

williamFalcon picture williamFalcon  Â·  3Comments

remisphere picture remisphere  Â·  3Comments