I am trying to freeze the free trained VGG16's layers ('conv_base' below) and add new layers on top of them for feature extracting.
I expect to get same prediction results from 'conv_base' before(ret1) / after(ret2) fit of model but it is not.
Is this wrong way to check weight freezing?
# loading VGG16 and set to untrainable
conv_base = applications.VGG16(weights='imagenet', include_top=False, input_shape=[150, 150, 3]) conv_base.trainable = False
#result before model fit
ret1 = conv_base.predict(np.ones([1, 150, 150, 3]))
# add layers on top of the VGG16 and compile a model
model = models.Sequential()
model .add(conv_base)
model .add(layers.Flatten())
model .add(layers.Dense(10, activation='relu'))
model .add(layers.Dense(1, activation='sigmoid'))
model.compile('rmsprop', 'binary_crossentropy', ['accuracy'])
# fit the model
model.fit_generator(train_generator, 100, validation_data=validation_generator, validation_steps=50)
#result after model fit
ret2 = conv_base.predict(np.ones([1, 150, 150, 3]))
#hope this is True but it is not.
np.equal(ret1, ret2)
Some of the weights can't be freezed, most notably the running mean and variance of the BatchNormalization. This is a known behaviour that has been causing some confusion for a while.
Have a look on this discussion:
https://github.com/fchollet/keras/issues/4762#issuecomment-299606870
@datumbox Thanks for reply.
Yes, I check the BatchNormalization case will not be frozen whether trainable or not.
But for VGG16 with include_top=False, it has only Conv2D and MaxPool2D (+ Input) which as I know don't have BatchNorm like updating nature.
+) checked np.allclose() instead of np.equal(), error doesn't disappear.
oops I'm sorry I did not noticed you said VGG16. Indeed the specific architecture does not have any BatchNorm layers.
Can you write a loop over the layers, compare before and after models and pin-point which frozen layers have different weights?
Also how big of a difference are we talking about here on the output? Could it be just rounding errors?
I've identified the issue. In short: when adding a Model or Sequential as the first layer in a Sequential model, the Sequential model will use the preexisting input and output of the model/sequential without calling the model/sequential on a new Input.
What that means is that the conv_base.trainable = False is ineffective, because model doesn't see conv_base itself, it sees all its inner layers instead.
The workaround is to set all inner layers to conv_base as non-trainable:
for layer in conv_base.layers:
layer.trainable = False
This is kind of a strange behavior so we will fix it. Presumably adding a model/sequential as first layer should still result in the model/sequential being called anew.
if you set model. trainable = False, should it not make layer.trainable for all layers false?
conv_base_model = VGG16(weights='imagenet', input_shape=(150, 150, 3), include_top=False)
conv_base_model.trainable = False
for layer in conv_base_model.layers:
print(layer.name, layer.trainable)
I am still getting true for all layers.
Am I missing something?
Yes
for layer in conv_base.layers:
layer.trainable = False
Most helpful comment
I fixed it: https://github.com/fchollet/keras/commit/c25fa38deb4efc5445f64af3ec17eae0eb660d2f