Pytorch-lightning: TrainerEvaluationLoopMixin activates model.train() at the end

Created on 8 Jul 2020  路  10Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

According to the example on fine-tuning, it is important to set the frozen sub-modules to eval mode. This is sensitive because when in training mode, BatchNorm and Dropout change state.

However, at the end of TrainerEvaluationLoopMixin._evaluate there is following code:

# enable train mode again
model.train()

So after the first validation run, the model is again completely in training mode and the freezing is partially undone (for layers like BatchNorm and Dropout).

API / design bug / fix help wanted

All 10 comments

I agree, this should not happen in the validation loop. Only the training loop should switch the model to training mode.

so, we have to track the state of all the frozen modules before?

@awaelchli When accepting that the training mode could have been customized, not even the training loop should change it carelessly.

@williamFalcon That would be one solution.

Another would be to just advice people to set the mode in on_epoch_start. (Would this be late enough so it is not reset by the Trainer?)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@williamFalcon @moi90 what if we just made this a hook? The default hook will implement what we have today, set the model to train or eval mode. But if user wants the fine tuning use case, they can override the hook and just set their layers to train/eval mode manually.

  • old behaviour is preserved
  • no complicated tracking needed
  • in the finetuning case, it will be fully transparent to the reader of the code which layers are eval and which are training
  • easy to implement

Where and when would this hook be called? How would this be different from using on_epoch_start?

Where and when would this hook be called?

Wherever we call model.eval / model.train today in the training loop, we would call the hook instead, which by default also just does that same thing as before, unless user overrides it.

How would this be different from using on_epoch_start?

I think this would be different from an on epoch start because it would allow you to prevent exactly what happens as described in the title: activating model.train for all layers at the end of an epoch.

@awaelchli can this be added post v1?

Yes

Thank you, this is much appreciated!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jcreinhold picture jcreinhold  路  3Comments

remisphere picture remisphere  路  3Comments

williamFalcon picture williamFalcon  路  3Comments

edenlightning picture edenlightning  路  3Comments

monney picture monney  路  3Comments