Pytorch-lightning: TrainerEvaluationLoopMixin activates model.train() at the end

Created on 8 Jul 2020 · 10Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

According to the example on fine-tuning, it is important to set the frozen sub-modules to eval mode. This is sensitive because when in training mode, BatchNorm and Dropout change state.

However, at the end of TrainerEvaluationLoopMixin._evaluate there is following code:

# enable train mode again
model.train()

So after the first validation run, the model is again completely in training mode and the freezing is partially undone (for layers like BatchNorm and Dropout).

API / design bug / fix help wanted

Source

moi90

All 10 comments

I agree, this should not happen in the validation loop. Only the training loop should switch the model to training mode.

awaelchli on 9 Jul 2020

so, we have to track the state of all the frozen modules before?

williamFalcon on 9 Jul 2020

@awaelchli When accepting that the training mode could have been customized, not even the training loop should change it carelessly.

@williamFalcon That would be one solution.

Another would be to just advice people to set the mode in on_epoch_start. (Would this be late enough so it is not reset by the Trainer?)

moi90 on 9 Jul 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 7 Sep 2020

@williamFalcon @moi90 what if we just made this a hook? The default hook will implement what we have today, set the model to train or eval mode. But if user wants the fine tuning use case, they can override the hook and just set their layers to train/eval mode manually.

old behaviour is preserved
no complicated tracking needed
in the finetuning case, it will be fully transparent to the reader of the code which layers are eval and which are training
easy to implement

awaelchli on 18 Sep 2020

Where and when would this hook be called? How would this be different from using on_epoch_start?

moi90 on 19 Sep 2020

Where and when would this hook be called?

Wherever we call model.eval / model.train today in the training loop, we would call the hook instead, which by default also just does that same thing as before, unless user overrides it.

How would this be different from using on_epoch_start?

I think this would be different from an on epoch start because it would allow you to prevent exactly what happens as described in the title: activating model.train for all layers at the end of an epoch.

awaelchli on 28 Sep 2020

👍1

@awaelchli can this be added post v1?