Keras: Freezing a BatchNormalization layer by setting trainable = false does not work. Weights are still being updated.

Created on 21 Jun 2017 · 22Comments · Source: keras-team/keras

The issue seems to be that that the updates that are created on applying the BatchNorm layer are added to the train function even when they act on non-trainable weights.

gist to code:
https://gist.github.com/afourast/0d7545174c1b8fb7b0f82d7efbf31743

Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.

Thank you!

[x] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Source

afourast

Most helpful comment

Freezing BN layers is now available in the most recent release, simply set trainable=False for the batchnorm layers.

https://github.com/keras-team/keras/releases/tag/2.1.3

ahundt on 18 Jan 2018

👍19 🎉11

All 22 comments

https://github.com/fchollet/keras/issues/7051#issuecomment-309851005

Danielhiversen on 22 Jun 2017

👍1

BN layer has alpha,beta,varience and mean layers, the alpha and beta layers' param can change the
trainable attribute,but the varience and mean param can not set untrainable. Although you set the BN layer trainable = False ,it will not work

hanzy123 on 24 Jun 2017

👍3

Locking BN is a technique used in segmentation especially with fine tuning so I think this is an option we need to add if it does not currently exist

ahundt on 24 Jun 2017

👍3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 22 Sep 2017

Is there any workaround available for this issue? It turns into hell when you're trying to use single shared model body (pre-trained model with trainable=False) inside multiple models which are trained on different datasets

nikitos9000 on 11 Oct 2017

If you are willing to give up on the Keras training routines (fit, fit_batch, fit_generator etc), you can define a custom training function in the way that Keras does for them https://github.com/fchollet/keras/blob/d3db58c8bf1ef9d078b3cf5d828d22346df6469b/keras/engine/training.py#L948, where you won't include the BN statistics updates. For example:

https://gist.github.com/afourast/018722309ac2a272ce4985190ec52742

In my models there are no other updates added to the 'updates' list besides the BN statistics, but you should check that with your models. Also I have only tested this with a tf backend. You will have to define a similar function for testing.

I've personally found that using a custom training function convenient since you can also do things like use tf queues and add tensorboard summaries.

Hope that helps

afourast on 11 Oct 2017

Thanks, but it seems too tricky for my case.

I've found another way to workaround that: you can set layer._per_input_updates = {} on your batch norm layers which shouldn't be updated during the training. It actually works, these layer weights stay the same, but it still looks like dirty hack.

nikitos9000 on 11 Oct 2017

👍4

I'm running into the same issue, is there a technical reason it can't be fixed?

NakramR on 29 Oct 2017

Also interested in the solution for this.

eclique on 31 Oct 2017

U can find:BatchNormalization layer was combined by four small layers:alpha layer,beta layer,and two other weight-layers,your Trainable=False order just keep two other weight-layers unchanged

hanzy123 on 31 Oct 2017

Right, and can the alpha and beta layers be frozen, and if so how?

NakramR on 31 Oct 2017

Thank you @nsmetanin for the suggestion. This worked to freeze batch norm layers.

for layer in model.layers:
    layer.trainable = False
    if isinstance(layer, keras.layers.normalization.BatchNormalization):
        layer._per_input_updates = {}

LazyMammal on 15 Jan 2018

👍12 🎉2

Freezing BN layers is now available in the most recent release, simply set trainable=False for the batchnorm layers.

https://github.com/keras-team/keras/releases/tag/2.1.3

ahundt on 18 Jan 2018

👍19 🎉11

Is there any way to get back the old behaviour?
I'd like to get the trainable weights frozen and the non-trainable weights unfrozen, as before.

ViaFerrata on 22 Jan 2018

broken workflow
:+1:
@ViaFerrata Sorry I don't have a real answer, perhaps look back in the commit history to see how it was done before?

ahundt on 25 Jan 2018

👍10 😄7 ❤1

You can set layer.stateful = True on your BN layer to get this behavior.

On Jan 24, 2018 17:53, "Andrew Hundt" notifications@github.com wrote:

[image: broken workflow]
https://camo.githubusercontent.com/082b3f1123b0dff4294002cb50e107a8a7a54dd3/68747470733a2f2f696d67732e786b63642e636f6d2f636f6d6963732f776f726b666c6f772e706e67
@ViaFerrata https://github.com/viaferrata I don't have a real answer
other than look back in the commit history to see how it was done before.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/7085#issuecomment-360335538,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb1tGvN7W0FYqi4vOIT--8b98Jtehks5tN96wgaJpZM4OBkQB
.

fchollet on 25 Jan 2018

👍8

Thanks a lot for the quick answer, I will try that :)

ViaFerrata on 25 Jan 2018

@ViaFerrata, just checking, what do you want the old behavior for?

ozabluda on 31 Jan 2018

@ozabluda Sorry for the late answer, didn't notice the reply.

Well, in my case I'm training two single CNNs on the same dataset, but with different projections of the input (my dataset is 6D).
Then, I take the conv layers of both single CNNs, freeze them and put them together in a new CNN by adding a new fully connected layer.
After that I just retrain the randomly initialized fully connected layers. And in this case I noticed, that stateful=False yields much worse results for the test loss than stateful=True.

So if I understand it correctly, the moving mean and variance is calculated with the batch level statistics during training and for testing it is calculated based on the whole sample before the testing starts.
However, if the training is resumed in a new epoch with stateful=False, is the moving mean and variance then taken from the epoch before after the last batch?

ViaFerrata on 12 Apr 2018

Training never uses anything other than current batch for training, but it updates the running averages for inference. If training is resumed, I think running average is reset to zero.

ozabluda on 12 Apr 2018

👍3

Thank you for the explanation, I've misread that in the paper. Then it makes sense that I get worse results with stateful=False during retraining.

ViaFerrata on 12 Apr 2018

From the code of Keras 2.1.3, what I see is training=False rather than trainable=False in the call() method of BN.
https://github.com/keras-team/keras/blob/2.1.3/keras/layers/normalization.py#L175