Keras: How do we know which backend functions are differentiable / not?

Created on 30 May 2018 · 11Comments · Source: keras-team/keras

Hi, I'm trying to define custom loss functions for my model but have come across this error several times during development:

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

I understand the need for differentiable loss functions, but I was wondering if there is any documentation on which functions in Keras are differentiable and which functions are not. My first instinct was that any of the Keras backend functions are differentiable and hence usable in my loss functions, but clearly (as seen in the error message) that is not the case.

I feel like it would be very helpful to have a list I can refer to, instead of discovering by trial and error as I have been doing so far. Does such a resource already exist, and if not, can we make one?

Source

JunShern

👍5

Most helpful comment

Hi @KrishnanParameswaran , I couldn't figure out a perfect way to do it but I was able to get what I needed done through trial and error, using a simple example like the following:

# Loss function
def custom_loss(input_x, output_x):
    loss = output_x - input_x
    return loss

# Dummy model
input_layer = Input(shape=INPUT_SHAPE)
hidden_layer = Dense(10)(input_layer)
output_layer = Reshape(OUTPUT_SHAPE)(hidden_layer)

model = Model(input_layer, output_layer)
model.compile(optimizer='sgd', loss=test_loss)

# Train the model
model.fit(train_x, train_y)

Using this sort of trial and error I did take note of which ops I found to be usable and which were problematic:

Differentiable ops

Indexing
+, -, *, /
K.squeeze
K.spatial_2d_padding

Non-differentiable ops

K.argmax
K.round
K.eval
K.greater
K.cast bool to float, int to float

Also note that it's not just the op itself which determines differentiability. It's about how you chain the ops. For example, K.mean(y_true) has no gradient and will error. On the other hand, K.mean(y_true - y_pred) does have a gradient and can train without issue.

But once you have a simple example to test on, it's actually not too difficult to get the loss you want. I managed to work out a number of custom losses by reformulating my objectives using common arithmetic operations which can actually be quite descriptive when used creatively.

JunShern on 9 Jul 2018

❤2 👍2

All 11 comments

@JunShern Did you get to know how to verify whether a function is differentiable or not ?

kris-mlguy on 7 Jul 2018

Hi @KrishnanParameswaran , I couldn't figure out a perfect way to do it but I was able to get what I needed done through trial and error, using a simple example like the following:

# Loss function
def custom_loss(input_x, output_x):
    loss = output_x - input_x
    return loss

# Dummy model
input_layer = Input(shape=INPUT_SHAPE)
hidden_layer = Dense(10)(input_layer)
output_layer = Reshape(OUTPUT_SHAPE)(hidden_layer)

model = Model(input_layer, output_layer)
model.compile(optimizer='sgd', loss=test_loss)

# Train the model
model.fit(train_x, train_y)

Using this sort of trial and error I did take note of which ops I found to be usable and which were problematic:

Differentiable ops

Indexing
+, -, *, /
K.squeeze
K.spatial_2d_padding

Non-differentiable ops

K.argmax
K.round
K.eval
K.greater
K.cast bool to float, int to float

JunShern on 9 Jul 2018

❤2 👍2

Can we close this?

Dref360 on 9 Jul 2018

Yes, I don't see a need to have this open for now.

JunShern on 13 Jul 2018

Thanks for your response @JunShern . I'm trying to use model weights and a C library (which needs to be compiled after every batch) in the loss function and wondering if there is a possibility at all.

kris-mlguy on 13 Jul 2018

@KrishnanParameswaran As I understand it, I don't think this is possible since the Keras optimizers need to know how to propagate gradients through your loss function, which wouldn't be possible with your compiled C code. But I'm no expert on the library, perhaps the Keras developers may have a better answer for you.

JunShern on 13 Jul 2018

I would want to reopen this issue. I understand @JunShern did get his problem solved, but in general this information should be in the backend documentation.
Besides, it would be wonderful if the error message could tell which operation is exactly offending.
I am now facing the same error message, but I am not using any of the operations in the 'black list' above. I do use concatenate, though, but that is used in many layers in keras.
@JunShern could you reopen until this gets added to the documentation?

miselico on 5 Sep 2018

This is the list of gradients. There are a few None creeping in.
screen shot 2019-03-08 at 12 11 45 pm
For me, the model got trained fine. But when I tried to load_model it started crying about differentiability. Certainly if it was trained that means it would have been differentiable but no, now there is some magic happening while I am doing load_model. Suddenly it has problems with differentiability.

Edit: It looks to be caused by model and not by loss function (cross_entropy). Debugging continues.

Edit: It is related to this https://github.com/keras-team/keras/issues/9992

Edit Fixed: Yes, its fixed. So apparently the overall unet model design was faulty. One block of layers (in vgg) weren't connected to anything in the decoder block(?). Those dangling layers had no gradient. Coz no weights(?). Anyway, full 1 wasted, damn. All the dangling useless layers were popped to fix it.

mkagenius on 6 Mar 2019

Hey,
Is K.cast for float64 to float32 differentiable?

roya90 on 18 Mar 2019

👍3

I'm trying to implement the following custom loss function; however, I'm getting ValueError: An operation hasNonefor gradient.. I understand K.cast might be the offending operation per @JunShern's post; however, if I remove kappa = K.cast(kappa, 'float32') I get the error TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'. Any ideas on how I can resolve this dilemma?

import tensorflow as tf
import keras.backend as K

def _cohen_kappa(y_true, y_pred, num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
    kappa, update_op = tf.contrib.metrics.cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
    kappa = K.cast(kappa, 'float32')
    K.get_session().run(tf.local_variables_initializer())
    with tf.control_dependencies([update_op]):
        kappa = tf.identity(kappa)
    return kappa

def cohen_kappa_loss(num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
    def cohen_kappa(y_true, y_pred):
        y_true = K.cast(y_true, 'int32')
        y_pred = K.cast(y_pred + 0.5, 'int32')

        y_true = K.sum(y_true, axis=1)
        y_pred = K.sum(y_pred, axis=1)

        return -_cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
    return cohen_kappa