Keras: Gradients w.r.t. VGG conv block are None when attaching a custom head to the base model

Created on 17 Jul 2018 · 7Comments · Source: keras-team/keras

[ X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[ X] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ X] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[ X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

As demonstrated by @fchollet in the DL with Python notebook "Visualizing what convnets learn", we can get the gradients of the output w.r.t. an arbitrary conv layer in the pretrained model (minimal example below).

However, when we attach a custom head to the model and try to get the gradients w.r.t. a layer of the VGG model included, the gradients are None (minimal example below).

Is this a bug, or expected behavior - and if so, is there an alternative way to get the gradients in this case?
Many thanks in advance!

Works:

from keras.applications.vgg16 import VGG16
from keras import backend as K

model = VGG16(weights='imagenet')
last_conv_layer = model.get_layer('block5_conv3')
grads = K.gradients(model.output, last_conv_layer.output)[0]
grads

Gradients are None:

from keras.applications.vgg16 import VGG16
from keras import backend as K
from keras import models
from keras import layers


conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(224, 224, 3))

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

conv_layer = model.layers[0].get_layer("block5_conv3")
conv_layer 
grads = K.gradients(model.output, conv_layer.output)[0]
print(grads)

Source

skeydan

Most helpful comment

OK me again :-)

It works when I switch to the functional API, inspired by

https://github.com/keras-team/keras/issues/4040

Working code:

from keras.applications.vgg16 import VGG16
from keras import backend as K
from keras.models import Sequential, Model
from keras.layers import Flatten, Dense, Input, GlobalAveragePooling2D

# create the base pre-trained model
base_model = VGG16(weights='imagenet', include_top=False)

# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

conv_layer = model.layers[1]
conv_layer 
grads = K.gradients(model.output[:, 0], conv_layer.output)[0]
print(grads)

In the functional API, VGG is not represented as a model, but a list of layers (at least this is how it looks comparing the respective outputs of summary()...

skeydan on 18 Jul 2018

👍2

All 7 comments

Please see the documentation for K.gradients.
https://keras.io/backend/#gradients
Hi!
This issue isn't related to a bug/enhancement/feature request or other accepted types of issue.

To ask questions, please see the following resources :

Thanks!

If you think I made a mistake, please re-open this issue.

Dref360 on 18 Jul 2018

Hi,

I apologize but I would like to try one more time :-)

I assume you are referring to this:

loss: Scalar tensor to minimize.

Sorry I neglected this in the above code examples (still it is surprising to me that even though I'm passing in non-scalar shapes in the first position, the first example works but the second doesn't).

If I adapt the examples and make sure they completely match the code used in https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.4-visualizing-what-convnets-learn.ipynb (exactly what I'm referring to is this part:

# This is the "african elephant" entry in the prediction vector
african_elephant_output = model.output[:, 386]

# The is the output feature map of the `block5_conv3` layer,
# the last convolutional layer in VGG16
last_conv_layer = model.get_layer('block5_conv3')

# This is the gradient of the "african elephant" class with regard to
# the output feature map of `block5_conv3`
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]

I still get the same difference between examples (1) and (2).

Example 1 - modified to have a scalar in the first argument position -

from keras.applications.vgg16 import VGG16
from keras import backend as K

model = VGG16(weights='imagenet')
last_conv_layer = model.get_layer('block5_conv3')
# first arg should be a scalar
grads = K.gradients(model.output[:, 386], last_conv_layer.output)[0]
print(grads)

returns a gradient tensor

Tensor("gradients_22/block5_pool_12/MaxPool_grad/MaxPoolGrad:0", shape=(?, 14, 14, 512), dtype=float32)

while example 2, modified in the same way

from keras.applications.vgg16 import VGG16
from keras import backend as K
from keras import models
from keras import layers


conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(224, 224, 3))

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

conv_layer = model.layers[0].get_layer("block5_conv3")
conv_layer 
grads = K.gradients(model.output[:, 0], conv_layer.output)[0]
print(grads)

still returns None.

BTW here is an issue that may be connected (possibly also related to submodel inclusion - having a model in a model?): https://github.com/keras-team/keras/issues/9992

It would be great if you could take a look again.

skeydan on 18 Jul 2018

😄1

Unfortunately I do not seem to be able to re-open the issue.

skeydan on 18 Jul 2018

Probably have to do with the Sequential API
This works :

from keras.applications.vgg16 import VGG16
from keras import backend as K
from keras import models
from keras import layers

inp = layers.Input([224, 224, 3])
conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_tensor=inp)

x = layers.Flatten()(conv_base.output)
x = layers.Dense(256)(x)
x = layers.Dense(1)(x)
model = models.Model(inp,x)
conv_layer = model.get_layer("block5_conv3")

grads = K.gradients(model.output[:, 0], conv_layer.output)[0]
print(grads)

Dref360 on 18 Jul 2018

OK me again :-)

It works when I switch to the functional API, inspired by

https://github.com/keras-team/keras/issues/4040

Working code:

from keras.applications.vgg16 import VGG16
from keras import backend as K
from keras.models import Sequential, Model
from keras.layers import Flatten, Dense, Input, GlobalAveragePooling2D

# create the base pre-trained model
base_model = VGG16(weights='imagenet', include_top=False)

# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

conv_layer = model.layers[1]
conv_layer 
grads = K.gradients(model.output[:, 0], conv_layer.output)[0]
print(grads)

In the functional API, VGG is not represented as a model, but a list of layers (at least this is how it looks comparing the respective outputs of summary()...

skeydan on 18 Jul 2018

👍2

Oh, I see you answered at about the same time I was typing in what I found :-)
Thank you! Then it seems it's doubly confirmed that it's required to use the functional API for this.

skeydan on 18 Jul 2018

Thank you so much @skeydan, you saved my day!