Keras: custom rmse loss return nan

Created on 16 May 2017 · 9Comments · Source: keras-team/keras

some infos:

Keras version: 2.0.4
Backend: tensorflow
Tensorflow version: 1.1.0
os: windows
gpu or cpu: cpu

I define a rmse loss function:

from keras import backend as K
def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))

and then use it in model, but after some iteration, the loss become 'nan', :(
why does it happen? thx

stale

Source

176coding

👍2

Most helpful comment

I encountered a similar problem in Keras v2.2.3 with a custom RSME function for loss and metric. Haven't tested it in Keras v2.2.4 yet.
MSE is always fine and works as expected as loss and metric:
K.mean(K.square(y_pred - y_true), axis=-1)

However RMSE,
K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
does not show the correct results.

I usually have MSE and RSME running in either loss or metric and RMSE is not the sqrt of MSE!

K.sqrt(K.mean(K.square(y_pred - y_true), axis=None)) is closer to sqrt(MSE) however not exactly.

Any ideas why this happens or how to further debug this?

I also noticed that none of the standard loss functions are using K.sqrt().

Paperone80 on 6 Oct 2018

👍3

All 9 comments

Can you post a full code snippet that replicates your problem? Without seeing the data it is not possible to figure out where your problem might lie.

bfarzin on 16 May 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 14 Aug 2017

I have the same problem. For about a second, i have a normal loss value reported, then it becomes inf and after that nan.

I have the following model:

def get(width=256, height=256):
    m = Sequential()

    m.add(Conv2D(96, 3, input_shape=(height, width, 3), padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 5, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 10, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 15, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(3, 15, padding='same'))
    m.add(Activation('tanh'))

    m.compile(optimizer='adadelta',
              loss=_custom_loss)
    return m

My loss function is as follows:

from keras.backend.tensorflow_backend import sum as tf_sum
from keras.backend.tensorflow_backend import abs as tf_abs

def _custom_loss(y_true, y_pred):
    x = tf_sum((((y_true[:, :, :]+1) - (y_pred[:, :, :]+1)) / (y_true[:, :, :]+1)), axis=-1) / 3.0
    return tf_abs(x)

The y_true and y_pred have the shape (?, 256, 256, 3)
Is it possible this has to do with the possibility that y_true and y_pred can be also of shape (256, 256,3)?

This problem does not occur when i use MSE as the loss function.

The channels in my image data vary from -1 to 1 and were calculated by channel / 127.5 - 1.

I am on Ubuntu 16.04, using the Tensorflow backend with GPU enabled.

bakaschwarz on 16 Aug 2017

I made a full script that shows the problem and attached the two graphics i used:

from keras.models import Sequential
from keras.layers import Conv2D, Activation
from keras.layers.advanced_activations import LeakyReLU
from keras import metrics
from keras.backend.tensorflow_backend import sum as tf_sum
from keras.backend.tensorflow_backend import abs as tf_abs

import numpy as np
from scipy.misc import imread

IMAGE_I_PATH = "source.png"
IMAGE_II_PATH = "watermark_source.png"

def generator():
    while 1:
        image_I = imread(IMAGE_I_PATH) / 127.5 - 1
        image_II = imread(IMAGE_II_PATH) / 127.5 - 1
        yield np.array([image_I]), np.array([image_II])


def get(width=256, height=256):
    m = Sequential()

    m.add(Conv2D(96, 3, input_shape=(height, width, 3), padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 5, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 10, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 15, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(3, 15, padding='same'))
    m.add(Activation('tanh'))

    m.compile(optimizer='adadelta',
              loss=_custom_loss)
    return m


def _custom_loss(y_true, y_pred):
    x = tf_sum((((y_true[:, :, :]+1) - (y_pred[:, :, :]+1)) / (y_true[:, :, :]+1)), axis=-1) / 3.0
    return tf_abs(x)

m = get()

m.fit_generator(generator=generator(),
                steps_per_epoch=100,
                epochs=3)

source
watermark_source

bakaschwarz on 16 Aug 2017

👍1

stale[bot] on 14 Nov 2017

I wonder if this problem was addressed? I met the same problem when using custom rmse loss.

zhichengMLE on 6 Aug 2018

👍1

However RMSE,
K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
does not show the correct results.

I usually have MSE and RSME running in either loss or metric and RMSE is not the sqrt of MSE!

K.sqrt(K.mean(K.square(y_pred - y_true), axis=None)) is closer to sqrt(MSE) however not exactly.

Any ideas why this happens or how to further debug this?

I also noticed that none of the standard loss functions are using K.sqrt().

Paperone80 on 6 Oct 2018

👍3

Hmm same issue here with just MSE (not even sqrt). Interestingly, from the official docs at https://keras.io/api/losses/:

def my_loss_fn(y_true, y_pred):
    squared_difference = tf.square(y_true - y_pred)
    return tf.reduce_mean(squared_difference, axis=-1)  # Note the `axis=-1`

model.compile(optimizer='adam', loss=my_loss_fn)

which results in nan for me after ~50 epochs, whereas

model.compile(optimizer='adam', loss='mse')

works without nan. Definitely something odd.

casperdcl on 7 Jun 2020

I wonder if it helps to replace y_true - y_pred with tf.subtract(y_true, y_pred).

drphilosopher on 27 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

showing raise KeyError('%s not in index' % objarr[mask])

vinayakumarr · 3Comments

IndexError: index 197 is out of bounds for axis 1 with size 2

KeironO · 3Comments

Regularizer config does not serialize to YAML

nryant · 3Comments

Extracting embeddings from layers

anjishnu · 3Comments

Understanding stateful_lstm.py

amityaffliction · 3Comments