Keras: How to obtain the gradient of each parameter in the last epoch of training

Created on 7 Apr 2016 · 36Comments · Source: keras-team/keras

I want to obtain the gradient of each parameter in the last epoch of training. Is there a way to do so in Keras?

Thanks,
Ming

stale

Source

tuming1990

Most helpful comment

OK here's a full working example from start to finish. Hopefully this will clear things up. What you do with the gradient tensors is define a keras function to evaluate those tensors for a particular setting of the model's inputs. Then call the function on a particular setting of the inputs!

Define model

from keras.layers import Input, Dense
from keras.models import Model

input = Input(shape=[2])
probs = Dense(1, activation='sigmoid')(input)

model = Model(input=input, output=probs)
model.compile(optimizer='sgd', loss='binary_crossentropy')

Get gradient tensors

weights = model.trainable_weights # weight tensors
weights = [weight for weight in weights if model.get_layer(weight.name[:-2]).trainable] # filter down weights tensors to only ones which are trainable
gradients = model.optimizer.get_gradients(model.total_loss, weights) # gradient tensors

print weights
# ==> [dense_1_W, dense_1_b]

Define keras function to return gradients

import keras.backend as K

input_tensors = [model.inputs[0], # input data
                 model.sample_weights[0], # how much to weight each sample by
                 model.targets[0], # labels
                 K.learning_phase(), # train or test mode
]

get_gradients = K.function(inputs=input_tensors, outputs=gradients)

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple

from keras.utils.np_utils import to_categorical

inputs = [[[1, 2]], # X
          [1], # sample weights
          [[1]], # y
          0 # learning phase in TEST mode
]

print zip(weights, get_gradients(inputs))
# ==> [(dense_1_W, array([[-0.42342907],
                          [-0.84685814]], dtype=float32)),
       (dense_1_b, array([-0.42342907], dtype=float32))]

ebanner on 8 Nov 2016

👍78 🎉23 ❤18 🚀4 😄4

All 36 comments

You can have the outputs of a particular layer by:
http://keras.io/faq/#how-can-i-visualize-the-output-of-an-intermediate-layer

The parameters (weights and so on) are easily retrieved in your model object.

To compute the gradient you can use this code:

import keras.backend as K
import numpy as np

X = K.placeholder(ndim=2) #specify the right placeholder
Y = K.sum(K.square(X)) # loss function
fn = K.function([X], K.gradients(Y, [X])) #function to call the gradient

That's a partial answer. I hope this helps.

philipperemy on 8 Apr 2016

👍5

How do you call fn? I tried fn(model.layers[k].input) where model.layers[k] is a layers.core.Dense.

gideonite on 13 May 2016

Here is an example where you can call the function on a 2x2 matrix. I hope this helps.

import keras.backend as K
import numpy as np

X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], K.gradients(Y, [X]))
print fn([np.ones((2, 2), dtype=np.float32)])

philipperemy on 14 May 2016

@gideonite If you can make it work on a toy example, please let me know

philipperemy on 14 May 2016

@philipperemy I wound up printing out the weight of my model during training to debug and have since moved on. Thank you, I appreciate the help.

gideonite on 20 May 2016

@philipperemy and @gideonite
I'm facing a very similar issue as well. I wish to compute the derivatives wrt parameters of the network to write the update equations outside the Keras loop. I'm currently using the tensorflow back-end, but can switch if the other has some special functionality that will help. A rough code snippet is below:

model = Sequential()
model.add(....) # add a couple of layers here
x = tf.placeholder(tf.float32, shape=(None, out_dim))
y = model(x)
loss = K.sum(K.square(y-target))  # just think of any standard loss fn

I need the gradient of loss wrt each parameter in the neural network for targeted reinforcement learning application, so the model.fit style functions are not useful for me. Roughly, what i cant to accomplish is:
param_grad = tf.gradients(loss, model_params)

The problem is, I am unable to get the symbolic model_params. If I do model.get_weights or something, it get's me the numeric weights and not a symbolic one. Would appreciate some help.

aravindr93 on 24 Jun 2016

👍1

From what I know, it's very hard to do it in Keras.

In your case, I strongly advice you to use Tensorflow only WITHOUT keras. It's much easier.

If you're interested in the final layer and if you use the MSE, you can always reverse-engineer the backpropagation function to find the gradient but that's a very specific case:

w[i+1]-w[i] = - learning_rate x dE/dW

Then your gradient would be:

dE/dW = (w[i] - w[i+1])/learning_rate

When you do model.get_weights(), your weights are evaluated. If you want the symbolic ones, do something like:

for layer in model.layers
    layer.W or layer.b

Hope this helps. Let me know if you can make it in Keras!

philipperemy on 25 Jun 2016

👍4

Hi @philipperemy.
I tried to do the same thing. But weights do not change! I posted the whole thing here.

nejlag on 9 Aug 2016

Here's how I did it:

def get_gradients(model):
    """Return the gradient of every trainable weight in model

    Parameters
    -----------
    model : a keras model instance

    First, find all tensors which are trainable in the model. Surprisingly,
    `model.trainable_weights` will return tensors for which
    trainable=False has been set on their layer (last time I checked), hence the extra check.
    Next, get the gradients of the loss with respect to the weights.

    """
    weights = [tensor for tensor in model.trainable_weights if model.get_layer(tensor.name[:-2]).trainable]
    optimizer = model.optimizer

    return optimizer.get_gradients(model.total_loss, weights)

ebanner on 5 Sep 2016

👍20

@ebanner What is model.total_loss? My model (Theano 0.9.0dev2) object has no such attribute - it only seems to have a .loss attribute and that is just the string name (e.g. "mse").

davidljung on 21 Sep 2016

@davidljung model.total_loss is a tensor containing the loss, which is determined by the type of loss you are using. I'm guessing you did not compile your model first? Here's a minimal example using categorical crossentropy loss:

from keras.layers import Input, Dense
from keras.models import Model

input = Input(shape=(2,))
probs = Dense(2, activation='softmax', name='probs')(input)

model = Model(input=input, output=probs)
model.compile(optimizer='sgd', loss='categorical_crossentropy')

model.total_loss
# ==> Elemwise{mul,no_inplace}.0

I am also using theano 0.9.0dev2 for the record.

ebanner on 21 Sep 2016

👍1

@ebanner I have tried you method, but I do not obtain the values, but get things like: [Elemwise{add,no_inplace}.0, GpuFromHost.0, GpuFromHost.0, GpuFromHost.0].
Could you help me?

jf003320018 on 24 Oct 2016

@ebanner I have tried you method too but have got the same result as @jf003320018 . What can we do with this?

NikitaRomanov on 7 Nov 2016

Define model

from keras.layers import Input, Dense
from keras.models import Model

input = Input(shape=[2])
probs = Dense(1, activation='sigmoid')(input)

model = Model(input=input, output=probs)
model.compile(optimizer='sgd', loss='binary_crossentropy')

Get gradient tensors

weights = model.trainable_weights # weight tensors
weights = [weight for weight in weights if model.get_layer(weight.name[:-2]).trainable] # filter down weights tensors to only ones which are trainable
gradients = model.optimizer.get_gradients(model.total_loss, weights) # gradient tensors

print weights
# ==> [dense_1_W, dense_1_b]

Define keras function to return gradients

import keras.backend as K

input_tensors = [model.inputs[0], # input data
                 model.sample_weights[0], # how much to weight each sample by
                 model.targets[0], # labels
                 K.learning_phase(), # train or test mode
]

get_gradients = K.function(inputs=input_tensors, outputs=gradients)

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple

from keras.utils.np_utils import to_categorical

inputs = [[[1, 2]], # X
          [1], # sample weights
          [[1]], # y
          0 # learning phase in TEST mode
]

print zip(weights, get_gradients(inputs))
# ==> [(dense_1_W, array([[-0.42342907],
                          [-0.84685814]], dtype=float32)),
       (dense_1_b, array([-0.42342907], dtype=float32))]

ebanner on 8 Nov 2016

👍78 🎉23 ❤18 🚀4 😄4

@ebanner Thank you for your answer. It really works. But could you tell me how to calculate the gradients for two or more samples simultaneously? Because I do not know the meanings of 'model.sample_weights', I cannot modify the code. Thank you very much.

jf003320018 on 8 Nov 2016

sample_weight is documented here:

sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample...

Passing a value of 1 for each sample gives all samples equal importance in the eyes of the optimizer.

As for scaling up the example to an arbitrary number of samples, see this example (using the same function defined in my previous post):

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple

from keras.utils.np_utils import to_categorical

nb_sample = 10

inputs = [np.random.randn(nb_sample, 2), # X
          np.ones(nb_sample), # sample weights
          np.random.randint(2, size=[nb_sample, 1]), # y
          0 # learning phase in TEST mode
]

print zip(weights, get_gradients(inputs))
# ==> [(dense_2_W, array([[-0.1869444 ],
                          [ 0.34009627]], dtype=float32)),
       (dense_2_b, array([ 0.17382634], dtype=float32))]

ebanner on 8 Nov 2016

👍8

@ebanner The problem is soved. Thank you very much.

jf003320018 on 9 Nov 2016

How would I apply the gradients that I retrieve using this onto a separate model with the same parameters?

jchen114 on 2 Feb 2017

model1.layers[i].set_weights(model1.layers[i].get_weights() + model2_layer_i__gradients)

patyork on 3 Feb 2017

In case anyone is trying to do this with the Sequential model and seeing errors like:

AttributeError: 'Sequential' object has no attribute 'total_loss'

you can use model.model in place of model in such places. Eg

gradients = model.optimizer.get_gradients(model.model.total_loss, weights)

pokey on 13 Feb 2017

👍9

I sent a PR to visualize grads via TensorBoard, see https://github.com/fchollet/keras/pull/6313.

gokceneraslan on 19 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 18 Jul 2017

@ebanner I have a query regarding the learning phase in TEST/TRAIN mode in inputs. What is their effect. I get the same set of gradients in both

shaifugpt on 18 Aug 2017

@shaifugpt Here's the documentation for learning phase. https://keras.io/backend/

learning_phase()
Returns the learning phase flag.

The learning phase flag is a bool tensor (0 = test, 1 = train) to be passed as input to any Keras function that uses a different behavior at train time and test time.

Returns

Learning phase (scalar integer tensor or Python integer).

For instance when learning_phase=1, a dropout layer will actually perform dropout (i.e. zero out each input activation with some probability). Whereas if learning_phase=0 a dropout layer will instead scale the input accordingly (as opposed to zeroing it out).

You're getting the same gradients in both cases because none of the layers you are using depend on the learning phase.

embanner on 21 Aug 2017

@ebanner If the first layer in the model is a merge layer. We need to pass two sets of inputs for each. I am passing it as:

inputs = [[[trainX[len(trainX)-40:len(trainX)],trainX[len(trainX)-40:len(trainX)]]], # X
      [1], # sample weights
      [trainY[len(trainX)-40:len(trainX)]], # y
      1 # learning phase in Train mode
]

Then,

   grads=get_gradients(inputs)

gives error TypeError: unhashable type: 'list'

What is the correct way of passing them

shaifugpt on 4 Sep 2017

👍1

I got the following error by running the code to get the following errors. I use keras 2.0.6 with theano 0.9.0. How to solve this? Thanks. @ebanner
weights = [weight for weight in weights if model.get_layer(weight.name[:-2]).trainable] # filter down weights tensors to only ones which are trainable AttributeError: 'NoneType' object has no attribute 'trainable'

RyanCV on 8 Sep 2017

Thanks, it is useful @ebanner . In keras 2.0.6, weight name has changed, like 'conv2d_1/kernel:0', @RyanCV

guoxiaolu on 28 Sep 2017

@shaifugpt

I just ran into the same error for the same reason (multiple inputs) and passing them sequentially seems to fix it.

I.e.:

 input_tensors = [model.inputs[0],
                 model.inputs[1],  #etc
                 model.sample_weights[0], # how much to weight each sample by
                 model.targets[0], # labels
                 K.learning_phase(), # train or test mode
    ]

Then:

inputs = [inputs[0],
               inputs[1],  
              [1], # sample weights
              trainY, # y
              0 # learning phase in TEST mode
    ]

mathieumb on 29 Nov 2017

👍2

For anyone looking for this with Keras 2.0 onwards this is the syntax:

import keras.backend as K

weights = model.trainable_weights # weight tensors
gradients = model.optimizer.get_gradients(model.total_loss, weights) # gradient tensors
input_tensors = model.inputs + model.sample_weights + model.targets + [K.learning_phase()]
get_gradients = K.function(inputs=input_tensors, outputs=gradients)
inputs = [x, x_off, np.ones(len(x)), y, 0]
grads = get_gradients(inputs)

doing this is no longer necessary, and gave me an error: weights = [weight for weight in weights if model.get_layer(weight.name[:-2]).trainable] # filter down weights tensors to only ones which are trainable.

Also note that my model had two X variables, hence why I have: x, x_off.

sachinruk on 17 Apr 2018

👍14 ❤1 🎉1

@sachinruk Absolutley perfect!

Thank you

AmitLit on 16 May 2018

@sachinruk Is it possible to calculate the gradients using "sub_losses", too? By sub losses I mean having 2 or more outputs (and losses: total_loss = loss_1 + loss_2) and then doing something like
model.optimizer.get_gradients(model.LOSS_1, weights) ?

ptiwald on 1 Sep 2018

👍1

@ebanner

sample_weight is documented here:

sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample...

Passing a value of 1 for each sample gives all samples equal importance in the eyes of the optimizer.

As for scaling up the example to an arbitrary number of samples, see this example (using the same function defined in my previous post):

Get gradients of weights for particular (X, sample_weight, y, learning_mode) tuple
from keras.utils.np_utils import to_categorical

nb_sample = 10

inputs = [np.random.randn(nb_sample, 2), # X
          np.ones(nb_sample), # sample weights
          np.random.randint(2, size=[nb_sample, 1]), # y
          0 # learning phase in TEST mode
]

print zip(weights, get_gradients(inputs))
# ==> [(dense_2_W, array([[-0.1869444 ],
                          [ 0.34009627]], dtype=float32)),
       (dense_2_b, array([ 0.17382634], dtype=float32))]

I am getting InvalidArgumentError: transpose expects a vector of size 4. But input(1) is a vector of size 3 When I do what you did. Is it because the training input data for me is 3D? as I am using the word embedding

getamu on 2 Oct 2018

What if there are multiple outputs, so model.total_loss consists of multiple losses and model.targets is also multiple labels, but we are only interested in one target? specifying model.targets[0][0] to select the first target does not work.

Youmna-H on 15 Feb 2019

@shaifugpt @mathieumb did you figure out how to work with multiple inputs? I also got same error.

michelleowen on 2 Apr 2019

@michelleowen Try something like:
Input_tensors = [model1.inputs[0],
model1.inputs[1],
model1.sample_weights[0], # how much to weight each sample by
model1.targets[0], # labels
K.learning_phase(),]

shaifugpt on 28 Jun 2019

@ebanner Is it possible compute gradient with respect to a specific weight connection of a layer rather than all weights.

shaifugpt on 28 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Understanding stateful_lstm.py

amityaffliction · 3Comments

Accessing the internal states c of LSTM for all the time steps of each input sequence

yil8 · 3Comments

compile() should not require arguments when not training

kylemcdonald · 3Comments

Extracting embeddings from layers

anjishnu · 3Comments

Messy log when printing inside a Callback

fredtcaroli · 3Comments

Keras: How to obtain the gradient of each parameter in the last epoch of training

Most helpful comment

Define model

Get gradient tensors

Define keras function to return gradients

Get gradients of weights for particular (X, sample_weight, y, learning_mode) tuple

All 36 comments

Define model

Get gradient tensors

Define keras function to return gradients

Get gradients of weights for particular (X, sample_weight, y, learning_mode) tuple

Get gradients of weights for particular (X, sample_weight, y, learning_mode) tuple

Get gradients of weights for particular (X, sample_weight, y, learning_mode) tuple

Related issues

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple

Get gradients of weights for particular `(X, sample_weight, y, learning_mode)` tuple