Keras: How can i use keras optimizer for backprop-ing on my own loss functions

Created on 16 Dec 2016  路  13Comments  路  Source: keras-team/keras

I am working on guided backprop for activation maximization. Instead of implementing rmsprop, Adam etc., I want to reuse optimizers defined in keras.

stale support

Most helpful comment

Please always post a stack trace or something if you have specific issues.

I put together a Gist showing how to use Keras optimizers. It should teach you the basic style of how everything goes together.

https://gist.github.com/bstriner/e1e011652b297d13b3ac3f99fd11b2bc

The standard in Keras is that model parameters are variables that live on the GPU and inputs and targets are placeholders that get passed in for each batch.

A training function is created with inputs: batch inputs, batch targets; and outputs: loss, accuracy, other metrics. The function also performs updates on the model parameters on the GPU each time it is executed.

To train, you just pass batch inputs and batch targets to the training function and print out the current loss.

At the end, if you want to get the trained parameters, use K.get_value.

from keras.optimizers import Adam
from keras import backend as K
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.metrics import categorical_accuracy
import numpy as np

# inputs and targets are placeholders
x = K.placeholder(name="x", shape=(None, 28*28))
ytrue = K.placeholder(name="y", shape=(None, 10))

# model parameters are variables
W = K.variable(np.random.random((28*28,10)).astype(np.float32))
b = K.variable(np.random.random((10,)).astype(np.float32))
params = [W, b]

# single layer model: softmax(xW+b) 
ypred = K.softmax(K.dot(x,W)+b)

# categorical cross entropy loss
loss = K.mean(K.categorical_crossentropy(ytrue, ypred),axis=None)

# categorical accuracy
accuracy = categorical_accuracy(ytrue, ypred)

# Train function
opt = Adam()
updates = opt.get_updates(params, [], loss)
train = K.function([x, ytrue],[loss, accuracy],updates=updates)

# Train the network
((xtrain, ytrain),(xtest, ytest)) = mnist.load_data()
xtrain = xtrain.reshape((-1, 28*28)) # flatten input image
ytrain = to_categorical(ytrain, 10)
for epoch in range(500):
    loss, accuracy = train([xtrain, ytrain])
    print("Epoch: {}, Loss: {}, Accuracy: {}".format(epoch, loss, accuracy))


All 13 comments

You should check out the optimizer API as defined in keras/optimizers.py.

I did.
normally i would compute grads as
grads_fn = K.gradients(loss_fn, input_tensor)[0]
loss_grads_fn =K.function([input_tensor], [loss_fn, grads_fn])

My backprop would be:
loss, grads = loss_grads_fn([numpy_array])
numpy_array -= grads * lr

get_gradients (https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L61) seems to be called by get_updates() in Adam. Do i just call get_updates() once to build the update function? I am not sure how to use that function either. Specifically, I am confused about parts that are building a function vs functions where i could pass my numpy array to compute updates.

This is the relevant portion: https://github.com/raghakot/keras-vis/blob/master/vis/optimizer.py#L163
Instead of rolling my custom rmsprop. It would be nicer if I used keras optimizers. Would appreciate if you could look through that code and advise. It is a keras visualization library :)

You can use Keras optimizers outside of Keras if you really can't do whatever you're doing within Keras.

Yes, it is important to call get_updates() once and only once and hang on to the returned updates. For example, the Adam optimizer locally creates momentum variables in the get_updates() function. Calling get_updates() multiple times for the same set of parameters will cause chaos.

If you have some custom loss function and a list of shared variables:

updates = opt.get_updates(params, constraints, loss)
fun = K.function([input],[], updates=updates)

You're better off doing backprop on the GPU instead of back-and-forth with numpy. Store your weights as GPU variables and update them with functions. When you need the weights in numpy, use get_value and set_value.

Cheers,
Ben

Thanks. the input (model.input) has shape (?, channels, rows, cols). When i try to create the update function using:

updates = opt.get_updates([input], [], [loss_fn])

it complains about None. Any ideas on how to handle that?

Please always post a stack trace or something if you have specific issues.

I put together a Gist showing how to use Keras optimizers. It should teach you the basic style of how everything goes together.

https://gist.github.com/bstriner/e1e011652b297d13b3ac3f99fd11b2bc

The standard in Keras is that model parameters are variables that live on the GPU and inputs and targets are placeholders that get passed in for each batch.

A training function is created with inputs: batch inputs, batch targets; and outputs: loss, accuracy, other metrics. The function also performs updates on the model parameters on the GPU each time it is executed.

To train, you just pass batch inputs and batch targets to the training function and print out the current loss.

At the end, if you want to get the trained parameters, use K.get_value.

from keras.optimizers import Adam
from keras import backend as K
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.metrics import categorical_accuracy
import numpy as np

# inputs and targets are placeholders
x = K.placeholder(name="x", shape=(None, 28*28))
ytrue = K.placeholder(name="y", shape=(None, 10))

# model parameters are variables
W = K.variable(np.random.random((28*28,10)).astype(np.float32))
b = K.variable(np.random.random((10,)).astype(np.float32))
params = [W, b]

# single layer model: softmax(xW+b) 
ypred = K.softmax(K.dot(x,W)+b)

# categorical cross entropy loss
loss = K.mean(K.categorical_crossentropy(ytrue, ypred),axis=None)

# categorical accuracy
accuracy = categorical_accuracy(ytrue, ypred)

# Train function
opt = Adam()
updates = opt.get_updates(params, [], loss)
train = K.function([x, ytrue],[loss, accuracy],updates=updates)

# Train the network
((xtrain, ytrain),(xtest, ytest)) = mnist.load_data()
xtrain = xtrain.reshape((-1, 28*28)) # flatten input image
ytrain = to_categorical(ytrain, 10)
for epoch in range(500):
    loss, accuracy = train([xtrain, ytrain])
    print("Epoch: {}, Loss: {}, Accuracy: {}".format(epoch, loss, accuracy))


Thanks. The example and gist are awesome. You should perhaps add or reference it somewhere in keras docs/examples for others.

Here is a minimal example of whats happening in my case.

from keras import backend as K
from keras.optimizers import Adam

x = K.placeholder(shape=(None, 224, 224, 3))
opt = Adam()

# Some contrived example
loss = K.square(x)

updates = opt.get_updates([x], [], [loss])
iterate = K.function([x], [], updates=updates)

This will give me TypeError: int() argument must be a string or a number, not 'NoneType' because x has None for batch dimension.

Also, how do i added a placeholder on top of model.input? Basically, i am trying to add a proxy input placeholder on top of the pretrained keras model so that i can perform certain input transformations of the GPU before feeding it into the model.input. I tried:

proxy = K.placeholder(shape=K.int_shape(model.input))
# This was my futile attempt to connect to existing model graph
proxy = model.input + K.variable(0.)

@bstriner I am new to Keras, in your example how I can modify it to get the model's parameters if I have a loaded network (e.g. VGG16) through load_model() ? Thanks

@mongoose54 kind of unrelated to the OP. If you have a model you can inspect model.layers model.layers[2].kernel etc. You can also just model.weights to get all the weights.

That will give you the tensor variable which gives you the variable name. You can get the actual value of the variable with import keras.backend as K; value = K.get_value(my_variable).

Cheers

@bstriner Sorry for placing it here.

However I have a question related to this topic:

Let's say I have the losses explicitly defined in a numpy array: losses = [0.23 0.432 2.23 ...] . How can I backpropagate them to update the network's parameters?

@bstriner thx for such an example but i have a weird problem.

the only reasonable difference with your example is:
updates = self.opt.get_updates(model.trainable_weights, [], loss_out)

model is actually learning, loss is going down, val accuracy increasing (actually up to 100 in some iterations), i can save and load the model etc.

but something wrong with LR not changing.

(i have changed these values just to see the change more easily, but no luck)

self.opt = SGD(lr=1.0, decay= 1e-3, momentum=0.5, nesterov=False)

K.get_value(m.opt.lr) => outputting always 1.0 in each loop, it doesnt change.
(each call to "get_updates" (train step, not test) should change it, however it doesnt)

any ideas anyone?

edit: just added opt.lr to outputs directly, still no change.

edit2: adding "self.lr = lr"
"after" the following statement in get_updates fixes this issue.
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,K.dtype(self.decay))))

edit3: since i use tf as backend, probably it works ok as it builds up a graph, but some dependencies might not work as expected since opt.lr is not updated correctly.

Is this a bug?

what do i miss here?

Hi @bstriner, small question for you. Suppose I add another output head to your nn above, then what would need further adjustment?

It's just that I have a very similar nn, but as soon as I add an extra head (output) to it, then I get the An operation hasNonefor gradient. Please make sure ... error.

Everything is working fine before adding the extra output.

Working code:

class NN():
...
def _build_nn(self):
        inputs = Input(shape=(self.obs_size,))
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l2')(inputs)
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l3')(x)
        actions_probs = Dense(units=self.n_actions, activation='softmax', use_bias=False, name='actions_probs')(x)
        self.nn = Model(inputs=inputs, outputs=[**actions_probs**])

    def _build_train(self):
        **actions_probs** = self.nn.output
        actions_1hot = K.placeholder(shape=(None, self.n_actions), name='actions_1hot')
        actions_scales = K.placeholder(shape=(None,), name='actions_scales')
        actions_probs = K.sum(actions_probs * actions_1hot, axis=1)
        log_actions_probs = K.log(actions_probs)
        policy_loss = -1 * actions_scales * log_actions_probs
        policy_loss = K.mean(policy_loss)
        entropy = K.mean(-(actions_probs * log_actions_probs))
        entropy_loss = -ENTROPY_BETA * entropy
        loss = policy_loss + entropy_loss
        optim = SGD(lr=LR, decay=1e-6, momentum=0.9, nesterov=True)
        updates = optim.get_updates(params=self.nn.trainable_weights, loss=loss)
        self.custom_train = K.function(inputs=[self.nn.input, actions_1hot, actions_scales], outputs=[loss], updates=updates)

Not-working code:

def _build_nn(self):
        inputs = Input(shape=(self.obs_size,))
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l2')(inputs)
        x = Dense(units=self.hidden_units, activation='relu', use_bias=True, name='l3')(x)
        actions_probs = Dense(units=self.n_actions, activation='softmax', use_bias=False, name='actions_probs')(x)
        **extra_head** = Dense(units=1, activation='linear', use_bias=False, name='extra_head')(x)
        self.nn = Model(inputs=inputs, outputs=[**actions_probs, extra_head**])

    def _build_train(self):
        **actions_probs, _** = self.nn.output
        actions_1hot = K.placeholder(shape=(None, self.n_actions), name='actions_1hot')
        actions_scales = K.placeholder(shape=(None,), name='actions_scales')
        actions_probs = K.sum(actions_probs * actions_1hot, axis=1)
        log_actions_probs = K.log(actions_probs)
        policy_loss = -1 * actions_scales * log_actions_probs
        policy_loss = K.mean(policy_loss)
        entropy = K.mean(-(actions_probs * log_actions_probs))
        entropy_loss = -ENTROPY_BETA * entropy
        loss = policy_loss + entropy_loss
        optim = SGD(lr=LR, decay=1e-6, momentum=0.9, nesterov=True)
        updates = optim.get_updates(params=self.nn.trainable_weights, loss=loss)
        self.custom_train = K.function(inputs=[self.nn.input, actions_1hot, actions_scales], outputs=[loss], updates=updates)

Aditional info: When python vs code debugging, I can see the contents of loss being (correctly?) constructed/passed in, but I can't see as well inside params...
Any ideas?

Good morning @bstriner , re-reading my own question, maybe I should feed the "_" inside the call to K.function, as now it needs 2 different _y_true's_, and actions_1hot is just 1 of the outputs...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nryant picture nryant  路  3Comments

somewacko picture somewacko  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments