Hi, I'm trying to define custom loss functions for my model but have come across this error several times during development:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I understand the need for differentiable loss functions, but I was wondering if there is any documentation on which functions in Keras are differentiable and which functions are not. My first instinct was that any of the Keras backend functions are differentiable and hence usable in my loss functions, but clearly (as seen in the error message) that is not the case.
I feel like it would be very helpful to have a list I can refer to, instead of discovering by trial and error as I have been doing so far. Does such a resource already exist, and if not, can we make one?
@JunShern Did you get to know how to verify whether a function is differentiable or not ?
Hi @KrishnanParameswaran , I couldn't figure out a perfect way to do it but I was able to get what I needed done through trial and error, using a simple example like the following:
# Loss function
def custom_loss(input_x, output_x):
loss = output_x - input_x
return loss
# Dummy model
input_layer = Input(shape=INPUT_SHAPE)
hidden_layer = Dense(10)(input_layer)
output_layer = Reshape(OUTPUT_SHAPE)(hidden_layer)
model = Model(input_layer, output_layer)
model.compile(optimizer='sgd', loss=test_loss)
# Train the model
model.fit(train_x, train_y)
Using this sort of trial and error I did take note of which ops I found to be usable and which were problematic:
Differentiable ops
Non-differentiable ops
Also note that it's not just the op itself which determines differentiability. It's about how you chain the ops. For example, K.mean(y_true) has no gradient and will error. On the other hand, K.mean(y_true - y_pred) does have a gradient and can train without issue.
But once you have a simple example to test on, it's actually not too difficult to get the loss you want. I managed to work out a number of custom losses by reformulating my objectives using common arithmetic operations which can actually be quite descriptive when used creatively.
Can we close this?
Yes, I don't see a need to have this open for now.
Thanks for your response @JunShern . I'm trying to use model weights and a C library (which needs to be compiled after every batch) in the loss function and wondering if there is a possibility at all.
@KrishnanParameswaran As I understand it, I don't think this is possible since the Keras optimizers need to know how to propagate gradients through your loss function, which wouldn't be possible with your compiled C code. But I'm no expert on the library, perhaps the Keras developers may have a better answer for you.
I would want to reopen this issue. I understand @JunShern did get his problem solved, but in general this information should be in the backend documentation.
Besides, it would be wonderful if the error message could tell which operation is exactly offending.
I am now facing the same error message, but I am not using any of the operations in the 'black list' above. I do use concatenate, though, but that is used in many layers in keras.
@JunShern could you reopen until this gets added to the documentation?
This is the list of gradients. There are a few None creeping in.

For me, the model got trained fine. But when I tried to load_model it started crying about differentiability. Certainly if it was trained that means it would have been differentiable but no, now there is some magic happening while I am doing load_model. Suddenly it has problems with differentiability.
Edit: It looks to be caused by model and not by loss function (cross_entropy). Debugging continues.
Edit: It is related to this https://github.com/keras-team/keras/issues/9992
Edit Fixed: Yes, its fixed. So apparently the overall unet model design was faulty. One block of layers (in vgg) weren't connected to anything in the decoder block(?). Those dangling layers had no gradient. Coz no weights(?). Anyway, full 1 wasted, damn. All the dangling useless layers were popped to fix it.
Hey,
Is K.cast for float64 to float32 differentiable?
I'm trying to implement the following custom loss function; however, I'm getting ValueError: An operation hasNonefor gradient.. I understand K.cast might be the offending operation per @JunShern's post; however, if I remove kappa = K.cast(kappa, 'float32') I get the error TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'. Any ideas on how I can resolve this dilemma?
import tensorflow as tf
import keras.backend as K
def _cohen_kappa(y_true, y_pred, num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
kappa, update_op = tf.contrib.metrics.cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
kappa = K.cast(kappa, 'float32')
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([update_op]):
kappa = tf.identity(kappa)
return kappa
def cohen_kappa_loss(num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
def cohen_kappa(y_true, y_pred):
y_true = K.cast(y_true, 'int32')
y_pred = K.cast(y_pred + 0.5, 'int32')
y_true = K.sum(y_true, axis=1)
y_pred = K.sum(y_pred, axis=1)
return -_cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
return cohen_kappa
@phao5814 did you ever resolve this? I am having a similar problem trying to implement cohen's kappa in keras.
Most helpful comment
Hi @KrishnanParameswaran , I couldn't figure out a perfect way to do it but I was able to get what I needed done through trial and error, using a simple example like the following:
Using this sort of trial and error I did take note of which ops I found to be usable and which were problematic:
Differentiable ops
Non-differentiable ops
Also note that it's not just the op itself which determines differentiability. It's about how you chain the ops. For example,
K.mean(y_true)has no gradient and will error. On the other hand,K.mean(y_true - y_pred)does have a gradient and can train without issue.But once you have a simple example to test on, it's actually not too difficult to get the loss you want. I managed to work out a number of custom losses by reformulating my objectives using common arithmetic operations which can actually be quite descriptive when used creatively.