Keras: Trainable = False isn't freezing weights

Created on 10 Dec 2016 · 38Comments · Source: keras-team/keras

I have a script that previously would freeze pre-trained weights from the ResNet50 model and train the new layers I placed on top of the base model. Now, the model summary is reporting that all weights are trainable (counted in the total params).

Keras at 4c1353c188b3412b22d9f65042973e56a05433fe
Theano at ae36be011c98b1a2f30753162db01f6588ff8be3

# Example code:

base_model = ResNet50(weights='imagenet', include_top=False, input_tensor=input_tensor)

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)

# random projection idea
x = Dense(256, trainable=False)(x)
x = BatchNormalization(axis=bn_axis)(x)
x = LeakyReLU(leaky_relu_slope)(x)
x = Dropout(0.5)(x)

# regular dense layer
x = Dense(128, W_constraint=maxnorm(4))(x)
x = LeakyReLU(leaky_relu_slope)(x)
x = Dropout(0.5)(x)

x = Dense(nb_classes, activation='softmax')(x)

model = Model(input=input_tensor, output=x)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers:
    layer.trainable = False

nadam_custom = Nadam(lr=0.0001, clipnorm=1., clipvalue=0.25)

model.compile(loss='categorical_crossentropy',
              optimizer=nadam_custom,
              metrics=['accuracy', 'top_k_categorical_accuracy'])

model.summary()

# reports all layers having parameters whereas previously the ResNet layers were zeros:
# Total params: 24170041

Source

jerheff

Most helpful comment

@pavanramkumar
Maybe this will clarify things:
Let D,G and GAN be the discriminator the generator and the stacked model in your example.
Also, let us assume the following pseudo-code for constructing and compiling the models:
1) Construct D
1a) Compile D
2) Construct G
3) Set D.trainable = False
4) Stack G and D, to construct GAN
4a) Compile GAN

If you set D.trainable = False before compiling the model D and then try to fit D you shall observe
that D is indeed "frozen". If you set D.trainable = False after compiling D and then try to fit D, it will actually start learning things. However, it will remain frozen during the training process of the GAN.
And this is the behaviour you might be after.
In both cases the summary() function will always tell you that the you do not have non-trainable parameters, at least in my case under keras version 1.2.1 .

Js-Mim on 13 Mar 2017

👍50 😕4 🎉4 👎4 ❤3

All 38 comments

Total params is a count of all params, trainable or not. That's not how you would count trainable weights.

fchollet on 12 Dec 2016

Ok, but this is a change in behavior from before.

jerheff on 13 Dec 2016

@fchollet I would recommend reporting trainable and non-trainable weights counts in model.summary(). It seems many users used to use .summary() to check trainable weights count (though it was a bug, it was assumed to be the expected behavior).

farizrahman4u on 13 Dec 2016

@farizrahman4u @fchollet I upgraded to 1.2.1 and I'm still having some problems with reporting trainable weights after setting model.trainable = False.

I looked at previous issues #4510 #4513 #4514 but I'm still facing some issues.

Briefly, stacked_model is like a stack of generators followed by a discriminator, with m_model --> g_model --> d_model

stacked_model.summary()
d_model.trainable = False
stacked_model.summary()

stacked_model.summary() returns the same thing both before and after setting d_model.trainable = False:

Total params: 283,603
Trainable params: 283,603
Non-trainable params: 0

I have tested this with both model.summary() and model.get_config(). get_config also reports that all layers are trainable both before and after.

However, based on tracking how the model behaves during training, I feel that model.trainable = False indeed works and this is merely a reporting issue. Can someone confirm?

I want to be absolutely sure that the models I freeze indeed stay frozen before I proceed to design more exotic training procedures.

Any help is appreciated! Thanks!

pavanramkumar on 10 Feb 2017

Js-Mim on 13 Mar 2017

👍50 😕4 🎉4 👎4 ❤3

If the model comes from the functional API not even just this works:

Construct D
D.trainable = False
Compile D

D.summary will say that there are 0 trainable params, but if you do D.fit the loss goes indeed down
Solutions?

gibipara92 on 10 Apr 2017

that's a nasty, NASTY bug that I encounter too.
As you I've implemented a GAN on keras, and I found the problem easily reproducible by @gibipara92 example: the trainable flag does not work at all with our functional API models!
At the best of my knowledge, EVERY implementation of GAN in keras is bugged:
https://github.com/tdeboissiere/DeepLearningImplementations/
https://github.com/phreeza/keras-GAN
https://github.com/osh/KerasGAN
and many more, all of them relies on freezing one piece of the network with trainable=false.

engharat on 11 Apr 2017

👍5

This is, of course, easy to disprove:

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

model = keras.models.Model(x, y)
model.trainable = False
model.compile(optimizer='rmsprop', loss='mse')

x = np.random.random((10, 3))
y = np.random.random((10, 5))
model.fit(x, y, epochs=10)

-> loss does not change, because the model is not trainable.

fchollet on 11 Apr 2017

👍11

There must be some more complex example so.
I'm actually loading a resnet50 into a model (let say "A" model), I put "A"
model trainable to False, then I make a new model "B" made of "A" model
plus other layers. "B" model trainable is True. I train "B" model on some
data, then I run a evaluate function on "A" and it is clearly changed.

2017-04-11 19:59 GMT+02:00 François Chollet notifications@github.com:

This is, of course, easy to disprove:

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

model = keras.models.Model(x, y)
model.trainable = False
model.compile(optimizer='rmsprop', loss='mse')

x = np.random.random((10, 3))
y = np.random.random((10, 5))
model.fit(x, y, epochs=10)

-> loss does not change, because the model is not trainable.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4674#issuecomment-293347387,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AP6Pp8uX6B-65b41-sRroinLhThbOUZ0ks5ru7-UgaJpZM4LJuiB
.

engharat on 11 Apr 2017

Demonstrating the existence of a bug should be easy: simply post a code snippet that should clearly have behavior A, but that has behavior B when run. If other people can reproduce, your job is done.

All evidence we currently have --including unit tests, because obviously trainability behavior has extensive unit tests-- indicates that there is no bug. Hard to make a case if you provide 0 evidence.

fchollet on 11 Apr 2017

Are you sure that some nasty connections or dropout(s) are not messing the output of the loss function?

Js-Mim on 11 Apr 2017

👍1

@fchollet right.
Here is the easiest example to reproduce the bug (running on keras 1.2):

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

model1 = keras.models.Model(x, y)
model1.trainable = True
model1.compile(optimizer='rmsprop', loss='mse')

data_x = np.random.random((10, 3))
data_y = np.random.random((10, 5))

model1.fit(data_x, data_y, nb_epoch=2)
out=model1.predict(data_x)
print out

model1.trainable = False 
z = keras.layers.Dense(5)(model1.output) 
model2 = keras.models.Model(x, z) 
model2.compile(optimizer='rmsprop', loss='mse') 

data_z = np.ones((10, 3)) 
data_w = np.ones((10, 5)) 

model2.fit(data_z, data_w, nb_epoch=2) 

out=model1.predict(data_x)
print out

The output from model1 should be equal before and after model2 train, because of model1.trainable=False before building model2. You can clearly see the output is changed - model1 is learning.

engharat on 11 Apr 2017

👍7

In this example your model1.trainable = False has no effect on the rest of the code, because you never use model1 past that point. Instead, you use its underlying layers, which are still trainable.

Essentially: when calling model1.output, you are retrieving the y tensor from y = keras.layers.Dense(5)(x), and adding stuff on top. Your code is equivalent to:

import numpy as np
import keras

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

z = keras.layers.Dense(5)(y)
model2 = keras.models.Model(x, z)
model2.compile(optimizer='rmsprop', loss='mse')

Hope that clears things up.

fchollet on 11 Apr 2017

👍4

I see. The obvious solution should be to put each model1 layer trainable = False. And in fact it works on our example: putting every model1 layer trainable to False is freezing model1 layer when training model2.
So, every GAN implementation doing:

1. Construct D
    1a) Compile D
2. Construct G
3. Set D.trainable = False
4. Stack G and D, to construct GAN 
     4a) Compile GAN

is actually wrong. All the ones I reported use this method.
Anyway, the problem is stil not fixed. Putting network.layers=False in my GAN did not solved the problem. The point is that our GAN model, with GEN, DISC and DCGAN models is more complicated than this example, and I don't seem to find a config where I can:

1. train DISC by feed forward GEN-->DISC, so in this step GEN layers need to be freezed, DISC is training
2. train GEN by feed forward GEN --> DISC by freezing DISC layers and leaving GEN trainable.

I'll build a simple snippet of code reproducing GAN behaviour, because @fchollet as you can imagine we cannot run 3 model.compile at every batch iteration.

engharat on 11 Apr 2017

👍1

Was having trouble because weights were not freezing. This does not freeze weights:
```inputs = Input((2,))
x = Dense(units=8, activation='tanh')(inputs)
x.trainable = False
x = Dense(units=1, activation='sigmoid')(x)
x.trainable = False
model = Model(input=inputs, output=x)

But this does:
 ``` inputs = Input((2,))
    x = Dense(units=8, activation='tanh')(inputs)
    x = Dense(units=1, activation='sigmoid')(x)
    model = Model(input=inputs, output=x)
    for l in model.layers:
        l.trainable = False

Is this expected behavior?

kartikeya1994 on 12 Apr 2017

👍3

"trainable" is a property of layers and models (it applies to the weights
of layers/models). Here you are setting it on activation tensors, which is
unrelated.

On 11 April 2017 at 16:54, Kartikeya Upasani notifications@github.com
wrote:

Was having trouble because weights were not freezing. This does not freeze
weights:

x = Dense(units=8, activation='tanh')(inputs)
x.trainable = False
x = Dense(units=1, activation='sigmoid')(x)
x.trainable = False
model = Model(input=inputs, output=x)

But this does:

x = Dense(units=8, activation='tanh')(inputs)
x = Dense(units=1, activation='sigmoid')(x)
model = Model(input=inputs, output=x)
for l in model.layers:
l.trainable = False

Is this expected behavior?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4674#issuecomment-293433329,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWbzCTWvBZBx9oidflxqOcaTNYw0JBks5rvBLQgaJpZM4LJuiB
.

fchollet on 12 Apr 2017

So, every GAN implementation doing [...] is actually wrong. All the ones I reported use this method.

That's definitely not true, the process you describe is in fact correct.

Although I cannot speak for the correctness of every single GAN implementation out there.

fchollet on 12 Apr 2017

As far as I understand what you can actually do is to make two models that
uses to the same layers(so are using the same weights) and put one of them
with all those layers.trainable= False and the other one with True. Then
after compiling one of them will train those layers, the other one don't.
I'm actually looping over model layers and put the trainable = false to
each layer.

2017-04-12 21:05 GMT+02:00 alexkruegger notifications@github.com:

ok. I understand how can I switch trainable parameter on model/layer and
how it reflect to trainable_weights attribute of layer/model. But to make
my changes works i have to recompile the model. So I can't switch some
layers from trainable and back 'on the fly', I have to call compile after
each change.

Is it normal behavior? or I miss something?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4674#issuecomment-293676816,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AP6Pp6twSCYpFlWSxitDFAJAc678Ob9mks5rvSB9gaJpZM4LJuiB
.

engharat on 12 Apr 2017

@fchollet we tested this again and made sure that the error is in reporting alone, i.e. model.summary(). as @farizrahman4u said earlier, it would be useful to modify this function to say which parameters are trainable and which are not.

expanding on what @Js-Mim said as a general principle,

compile a model as soon as you define it.

model1 = Model(inputs, outputs)
model1.compile()

if you want to freeze selected layers in model1 within a training loop, just make a copy

model2 = deepcopy(model1)
layers_to_freeze = [0, 2, 4]
for l, layer in model2.layers:
  if l in layers_to_freeze:
    l.trainable = False
model2.compile()

within a training loop, iterate between fitting model1 and model2:

while(training):
  ...
  model1.fit(x, y)
  ...
  model2.fit(x, y)

this way, @engharat you don't need to compile within a training loop.

hope this is useful

pavanramkumar on 12 Apr 2017

👎3

@pavanramkumar at the best of my knowledge, f I understand correctly the deepcopy function applied to keras models, this solution seems not viable: when you are creating model2 as a deepcopy of model1, they are not sharing their weights anymore.
after the deepcopy, if you train model1(whose layers are free to learn) the model2 weights will not be updated, so they are in fact two different models, not sharing anything - they would just start with the same weights.

engharat on 12 Apr 2017

@fchollet I finally managed to easily reproduce the bug that was driving me crazy in the last days.
The trainable property works like a charm on simple GAN models, but is buggy in one case.
Task: load a pretrained network using it as feature extractor(freezing its layers), build a model that uses the pretrained network adding a FC layer, train the model, test if the pretrained network changed:

from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg19 import VGG19
from keras.layers import  Input
from keras.layers import Dense, Flatten
from keras.models import Model
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

input = Input(shape=(3,224,224), name="image_input")
net = VGG19(include_top=False, weights='imagenet')
net.trainable = False
for l in net.layers:
    l.trainable = False

out = net(input)
x = Flatten()(out)
x = Dense(1000, activation='softmax', name='fc1000')(x)
model = Model(input, x)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

img_path = 'mug.jpg'
img = image.load_img(img_path, target_size=(224, 224))
image = image.img_to_array(img)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)

out1=net.predict(image)
model.train_on_batch(image,np.ones([1,1000]))
out2=net.predict(image)
#testing if the resnet outputs before and after model train are equals:
print( np.array_equal(out1,out2) )

the two outputs, before and after model training, are equals as they should be.
Let repeat the same with ResNet50 or InceptionV3:

from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg19 import VGG19
from keras.layers import  Input
from keras.layers import Dense, Flatten
from keras.models import Model
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

input = Input(shape=(3,224,224), name="image_input")
net = ResNet50(include_top=False, weights='imagenet')
net.trainable = False
for l in net.layers:
    l.trainable = False

out = net(input)
x = Flatten()(out)
x = Dense(1000, activation='softmax', name='fc1000')(x)
model = Model(input, x)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

img_path = 'mug.jpg'
img = image.load_img(img_path, target_size=(224, 224))
image = image.img_to_array(img)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)

out1=net.predict(image)
model.train_on_batch(image,np.ones([1,1000]))
out2=net.predict(image)
#testing if the resnet outputs before and after model train are equals:
print( np.array_equal(out1,out2) )

The outputs are not equal anymore: here is the bug.
The bug happens with ResNet50 and InceptionV3 and does not happen on VGG16 and VGG19, I imagine is something related to the batchnorm layer, maybe is not counted as layer and so is not catched by for l in net.layers: ?
Edit: i see batchnormalization layers in net.layers, so it really seems a bug.

engharat on 14 Apr 2017

👍1

@fchollet @engharat It seems that BatchNormalization layer is always learning.
Please try the example bellow:

Edit: BatchNormalization doesn't have a "trainable" flag, i.e., it doesn't depend on gradients to be learned. To "freeze" it, just set momentum to 1.

import numpy as np
import keras

inp = keras.layers.Input(shape=(4,))
sample = np.random.normal(size=(10,4))
labels = np.random.normal(size=(10,1))
test = np.random.normal(size=(1,4))

x = keras.layers.Dense(1)(inp)
model = keras.models.Model(inp, x)
model.layers[1].trainable = False
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=1.))

out1 = model.predict(test)
model.fit(sample, labels)
out2 = model.predict(test)
print (out1, out2) # Until here, everything OK

# Now just add a BN layer
x = model(inp)
# Edit: add momentum
x = keras.layers.BatchNormalization(momentum=1.)(x)
model = keras.models.Model(inp, x)
model.layers[1].trainable = False
# Edit: removing that line
# model.layers[2].trainable = False
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=1.))
model.summary() # Trainable params: 0

out3 = model.predict(test)
w1 = model.layers[2].get_weights()

model.fit(sample, labels)

out4 = model.predict(test)
w2 = model.layers[2].get_weights()

print (out3, out4)
print (w1)
print (w2)

It is definitely not a bug. It is a conceptual issue.

dluvizon on 19 Apr 2017

The batchnorm layer updates its mean and variance statistics at training
time. It is not learning, as it contains no trainable parameters: no
gradients get backpropagated to its weights (gamma and beta).

On 19 April 2017 at 06:41, Diogo Luvizon notifications@github.com wrote:

@fchollet https://github.com/fchollet @engharat
https://github.com/engharat It seems that BatchNormalization layer is
always learning.
Please try the example bellow:

import numpy as np
import keras

inp = keras.layers.Input(shape=(4,))
sample = np.random.normal(size=(10,4))
labels = np.random.normal(size=(10,1))
test = np.random.normal(size=(1,4))

x = keras.layers.Dense(1)(inp)
model = keras.models.Model(inp, x)
model.layers[1].trainable = False
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=1.))

out1 = model.predict(test)
model.fit(sample, labels)
out2 = model.predict(test)
print (out1, out2) # Until here, everything OK

Now just add a BN layer

x = model(inp)
x = keras.layers.BatchNormalization()(x)
model = keras.models.Model(inp, x)
model.layers[1].trainable = False
model.layers[2].trainable = False
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=1.))
model.summary() # Trainable params: 0

out3 = model.predict(test)
w1 = model.layers[2].get_weights()

model.fit(sample, labels)

out4 = model.predict(test)
w2 = model.layers[2].get_weights()

print (out3, out4)
print (w1)
print (w2)

The inconsistency is: there are 0 trainable parameters in the model, but
the BN layer changed.
I hope that gives you some insights to better understand the problem.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4674#issuecomment-295274265,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb2R2kdcW706GWamQ6CYnJaB8NZgrks5rxg8DgaJpZM4LJuiB
.

fchollet on 19 Apr 2017

❤1 👍1

@engharat Have you found a solution for this problem? Whenever I freeze a model using the method described above (create D, compile D, create G, freeze D, create GAN, compile GAN), the model D will train if I only put GAN in a loop. This is bad because now D will learn in the opposite ''direction'' of what it should do.

ghost on 13 Aug 2017

Hi all,
This topic might be out of the date, but i recently experienced a chaos with GANs and especially when freezing layers.

Thanks to @Js-Mim , you made my life easier, however I found another problem for which I am posting my endeavors about.

The problem is when combining two models that are not trainable:

    g.trainable = False
    d.trainable = False

    model = Sequential()
    model.add(g)
    model.add(d)

    model.compile....

I would expect it to work, but it doesn't (Keras 2.0.6). Only if setting

model.trainable = False
model.compile....

then it works.

less elegant solution to this problem is adding non-trainable layers (though you must specify them separately with l.trainable = False, that was also a bit unexpected for me.

    model = Sequential()
    for l in g.layers + d.layers:
        l.trainable = False
        model.add(l)

Something is definitely going on here... though my ignorance might be the main reason of my adventures. Could someone explain me who is who here? bug/misconception/ignorance?

bellatrics on 8 Sep 2017

👍1

We need to have 'setTrainable' method per layer (and not per compiled model) that is cheap to call.

mendi80 on 9 Sep 2017

👍1

I got an error when I set some layers of resnet to be frozen, like this:

base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))

model = Sequential()

model.add(base_model)

model.add(Flatten())

model.add(Dense(80, activation="softmax"))

for layer in base_model.layers[:-26]:
    layer.trainable = False

the error is as follow:
ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_72:0', which has shape '(3, 3, 128, 128)'
There are 174 layers in resnet50, how many layers can I freeze with resnet50 when I try to do fine-tuning?

jamesben6688 on 7 Oct 2017

Want to add a +1 here. I'm running code from here. Github repo here.

Code is roughly:

    def discriminator(self):
        self.D = Sequential()
        ...
        self.D.add(Conv2D(depth*1, 5, strides=2, input_shape=input_shape,\
            padding='same'))
       ...

    def generator(self):
        self.G = Sequential()
       ...

    def discriminator_model(self):
        optimizer = RMSprop(lr=0.0002, decay=6e-8)
        self.DM = Sequential()
        self.DM.add(self.D)
        self.DM.compile(loss='binary_crossentropy', optimizer=optimizer,\
            metrics=['accuracy'])
        return self.DM

    def adversarial_model(self):
        if self.AM:
            return self.AM
        optimizer = RMSprop(lr=0.0001, decay=3e-8)
        self.AM = Sequential()
        self.AM.add(self.G)
        self.D.trainable = False
        self.AM.add(self.D)
        self.AM.compile(loss='binary_crossentropy', optimizer=optimizer,\
            metrics=['accuracy'])
        return self.AM
   ...

I added the self.D.trainable = False myself since he's missing it, and yet D still trains. This, OTOH, does work:

        self.AM = Sequential()
        self.AM.add(self.G)

        # Instead of self.D.trainable = False; self.AM.add(self.D)
        for l in self.D.layers:
            l.trainable = False
            self.AM.add(l)        

        self.AM.compile(loss='binary_crossentropy', optimizer=optimizer,\
            metrics=['accuracy'])

@fchollet this seems unexpected per your comment, no?

monktastic on 3 Nov 2017

does setting trainable property on models works or not ? :)

bicepjai on 22 Nov 2017

I also have a problem with this but after reading a few posts I noticed that
layer.trainable=True does not work but trainable_model.layers[1].trainable=True work.
I`m not sure if this is a good direction but i think this example works:

data = np.random.randint(60, size=(10, 5)) 
labels = np.random.randint(60, size=(10, 10))

x = Input(shape=(5,))
layer = Dense(10, activation='relu', trainable=False)(x)
y = Dense(10, activation='softmax', trainable=False)(layer)

frozen_model = Model(x, y)
frozen_model.compile(optimizer='rmsprop', loss='mse')

trainable_model = Model(x, y)
trainable_model.layers[1].trainable=True
trainable_model.layers[2].trainable=True

##for layer in trainable_model.layers:
##layer.trainable=True

##with this model the weights of the layer will be updated during training
##(which will also affect the above model since it uses the same layer instance)

trainable_model.compile(optimizer='rmsprop', loss='mse')

frozen_model.fit(data, labels)  # this does NOT update the weights of `layer`
trainable_model.fit(data, labels)  # this updates the weights of `layer`

my output:
Using TensorFlow backend.
Epoch 1/10
10/10 [==============================] - 0s - loss: 1367.3923
Epoch 2/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 3/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 4/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 5/10
10/10 [==============================] - 0s - loss: 1367.3921
Epoch 6/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 7/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 8/10
10/10 [==============================] - 0s - loss: 1367.3923
Epoch 9/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 10/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 1/10
10/10 [==============================] - 0s - loss: 1367.3922
Epoch 2/10
10/10 [==============================] - 0s - loss: 1367.2385
Epoch 3/10
10/10 [==============================] - 0s - loss: 1367.1526
Epoch 4/10
10/10 [==============================] - 0s - loss: 1367.0762
Epoch 5/10
10/10 [==============================] - 0s - loss: 1367.0391
Epoch 6/10
10/10 [==============================] - 0s - loss: 1367.0096
Epoch 7/10
10/10 [==============================] - 0s - loss: 1366.9846
Epoch 8/10
10/10 [==============================] - 0s - loss: 1366.9617
Epoch 9/10
10/10 [==============================] - 0s - loss: 1366.9392
Epoch 10/10
10/10 [==============================] - 0s - loss: 1366.9155

which was helpfull for me:
https://github.com/fchollet/keras/issues/4471
https://github.com/fchollet/keras/issues/2506
https://keras.io/getting-started/faq/#how-can-i-freeze-keras-layers

Inesz on 12 Dec 2017

The interesting part is that, keras.models.Model.trainable affects both freeze or not and the non-trainable counting while keras.layers.trainable does not.
The layers.trainable seems not working at all after several trial using these code above:

from keras.layers import Input, Dense, Add, Flatten
from keras.models import Model
from keras.utils import plot_model
import numpy as np

inputs = Input(shape=(3,))
inputs.trainable = False

x = Dense(64, activation='tanh', name='l1')(inputs)
x.trainable = False

x = Dense(64, activation='tanh', name='l2')(x)
x.trainable = False

outputs = Dense(3, name='l3')(x)
outputs.trainable = False

model = Model(inputs=inputs, outputs=outputs)
model.trainable = False

model.compile(optimizer='rmsprop', loss='binary_crossentropy')

print(model.summary())

print(model.predict(np.reshape([1,1,1], (1,3))))
model.fit(np.reshape([1,2,3],(1,3)), np.reshape([0.7, 4.6, 2.3],(1,3)))
print(model.predict(np.reshape([1,1,1], (1,3))))
model.fit(np.reshape([2,4,-3],(1,3)), np.reshape([0.15, 1, .8],(1,3)))
print(model.predict(np.reshape([1,1,1], (1,3))))
model.fit(np.reshape([0.5,1.2,2],(1,3)), np.reshape([1.2, 2.2, 8.9],(1,3)))
print(model.predict(np.reshape([1,1,1], (1,3))))
plot_model(model, to_file='test.png', show_shapes=True)

This prints out the prediction of the same input and I modify those trainable options to observe whether it is working or not.

showaykerker on 22 Dec 2017

Is there a way to block training on a shared layer just on some areas of the network ? @showaykerker

edmondja on 31 Dec 2017

Hi, any updates on this?
I'm implementing an adversarial autoencoder and running:

for layer in model.layers:
    layer.trainable = False
model.trainable = False

does not freeze the model as one would expect.
My architecture is a bit complex so it would not be easy to post the code to replicate the issue, but it's a conceptual variation of this example.
Any pointers?

Cheers

danielegrattarola on 16 Feb 2018

I think this is the same problem like calling model.trainable = False directly. The flag will only be applied to the model (on the highest level of abstraction). If you look into the graphs the single layers and in there the weights etc. still will show trainable = True (so you really have to call it on everything in your whole graph that has a trainable flag "manually" in order to set every flag to False). If you encapsulate layers somehow in complex architectures this behavior is really annoying.

Imho there should be something like the --recursive option for many commands in the terminal. This way one would be able to call model.trainable = False with the expected behavior: the model and all of its children layers would be recursively set their trainable flags to False.

Daniel451 on 16 Feb 2018

👍1

Just a follow up with a quick 'n' dirty solution for adversarial settings.
The trick is to define a non-trainable clone of the discriminator and copy the weights from the discriminator to the clone before training the generator to fool the discriminator.
A solution sketch is as follows:

# Frozen discriminator
frozen_disc = Sequential(name='frozen_discriminator')
# Same exact definition as discriminator (with same layer names)
# ...
# ...
frozen_disc.add(Dense(1, activation='sigmoid', name='discriminator_layer_n'))
# Freeze layers
for layer in frozen_disc.layers:
    layer.trainable = False
frozen_disc.trainable = False

# Frozen discriminator + Generator
discriminator_gen = Model(inputs=generator.input, outputs=frozen_disc(generator.output))

# Compile models
# ...

# Fit models
generator.fit(...)
# ...
discriminator.fit(...)
# ...
# Copy weights
for l in ['discriminator_layer_1', '...', 'discriminator_layer_n']:
    to_set = discriminator.get_layer(l).get_weights()
    discriminator_gen.get_layer('frozen_discriminator').get_layer(l).set_weights(to_set)
# ...
discriminator_gen.fit(...)

Hope this helps.

Cheers,
DG

danielegrattarola on 27 Feb 2018

👍1

@danielegrattarola this looks like a helpful hack, indeed. However, I would like an easier, more intuitive solution. For example, what about using TensorFlow's scopes "under-the-hood"? In TF itself it is not necessary to actually freeze weights, you can also tell the optimizer (for example with a variable scope) which variables it is allowed to touch during a specific training iteration.

I wonder why this issue already got closed. Sure, it was about reporting that the default behavior of trainable = False is a bug in the first place, but I think it shifted very quickly to "_unintuitively_" and suggested the need of an easy, official way to achieve the requested behavior.

Is there an official feature request for this?

Daniel451 on 27 Feb 2018

@Daniel451 I think the main problem with this is to tell the optimizer which weights have become frozen, hence the need to recompile after freezing the layers of an already compiled model.
The TF solution that you suggest might work fine for TF, but we have to consider that Keras offers the Theano and CNTK backends, too.

I don't think we would ever see the feature implemented, even if we were to request it, so we'll have to make do with hacks like the one I posted above.

Cheers

danielegrattarola on 27 Feb 2018

I think I got a way to work. Just set the trainable to False through get_layer() function, not D.trainable = False . I run the GAN model for a few iterations and print the weights of some layer in D model and the D model inside D_on_G. In each iteration, training D_on_G doesn't update the weights, while training D does. And the weights of D and the D inside D_on_G are the same. So I think it works.

The code is something like this:

G = generator_model(..., name='generator')
D = discriminator_model(..., name='discriminator')
D_on_G = Sequential()
D_on_G.add(G)
D_on_G.add(D)
G.compile(...)
D.compile(...)
D_on_G.get_layer('discriminator').trainable = False
D_on_G.compile(...)

I am using Keras 2.0.5 with TensorFlow 1.3.0 as backend.