Keras: custom rmse loss return nan

Created on 16 May 2017  路  9Comments  路  Source: keras-team/keras

some infos:

  • Keras version: 2.0.4
  • Backend: tensorflow
  • Tensorflow version: 1.1.0
  • os: windows
  • gpu or cpu: cpu

I define a rmse loss function:

from keras import backend as K
def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1)) 

and then use it in model, but after some iteration, the loss become 'nan', :(
why does it happen? thx

stale

Most helpful comment

I encountered a similar problem in Keras v2.2.3 with a custom RSME function for loss and metric. Haven't tested it in Keras v2.2.4 yet.
MSE is always fine and works as expected as loss and metric:
K.mean(K.square(y_pred - y_true), axis=-1)

However RMSE,
K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
does not show the correct results.

I usually have MSE and RSME running in either loss or metric and RMSE is not the sqrt of MSE!

K.sqrt(K.mean(K.square(y_pred - y_true), axis=None)) is closer to sqrt(MSE) however not exactly.

Any ideas why this happens or how to further debug this?

I also noticed that none of the standard loss functions are using K.sqrt().

All 9 comments

Can you post a full code snippet that replicates your problem? Without seeing the data it is not possible to figure out where your problem might lie.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

I have the same problem. For about a second, i have a normal loss value reported, then it becomes inf and after that nan.

I have the following model:

def get(width=256, height=256):
    m = Sequential()

    m.add(Conv2D(96, 3, input_shape=(height, width, 3), padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 5, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 10, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 15, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(3, 15, padding='same'))
    m.add(Activation('tanh'))

    m.compile(optimizer='adadelta',
              loss=_custom_loss)
    return m

My loss function is as follows:

from keras.backend.tensorflow_backend import sum as tf_sum
from keras.backend.tensorflow_backend import abs as tf_abs

def _custom_loss(y_true, y_pred):
    x = tf_sum((((y_true[:, :, :]+1) - (y_pred[:, :, :]+1)) / (y_true[:, :, :]+1)), axis=-1) / 3.0
    return tf_abs(x)

The y_true and y_pred have the shape (?, 256, 256, 3)
Is it possible this has to do with the possibility that y_true and y_pred can be also of shape (256, 256,3)?

This problem does not occur when i use MSE as the loss function.

The channels in my image data vary from -1 to 1 and were calculated by channel / 127.5 - 1.

I am on Ubuntu 16.04, using the Tensorflow backend with GPU enabled.

I made a full script that shows the problem and attached the two graphics i used:

from keras.models import Sequential
from keras.layers import Conv2D, Activation
from keras.layers.advanced_activations import LeakyReLU
from keras import metrics
from keras.backend.tensorflow_backend import sum as tf_sum
from keras.backend.tensorflow_backend import abs as tf_abs

import numpy as np
from scipy.misc import imread

IMAGE_I_PATH = "source.png"
IMAGE_II_PATH = "watermark_source.png"

def generator():
    while 1:
        image_I = imread(IMAGE_I_PATH) / 127.5 - 1
        image_II = imread(IMAGE_II_PATH) / 127.5 - 1
        yield np.array([image_I]), np.array([image_II])


def get(width=256, height=256):
    m = Sequential()

    m.add(Conv2D(96, 3, input_shape=(height, width, 3), padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 3, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 5, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 10, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(96, 15, padding='same'))
    m.add(LeakyReLU())

    m.add(Conv2D(3, 15, padding='same'))
    m.add(Activation('tanh'))

    m.compile(optimizer='adadelta',
              loss=_custom_loss)
    return m


def _custom_loss(y_true, y_pred):
    x = tf_sum((((y_true[:, :, :]+1) - (y_pred[:, :, :]+1)) / (y_true[:, :, :]+1)), axis=-1) / 3.0
    return tf_abs(x)

m = get()

m.fit_generator(generator=generator(),
                steps_per_epoch=100,
                epochs=3)

source
watermark_source

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

I wonder if this problem was addressed? I met the same problem when using custom rmse loss.

I encountered a similar problem in Keras v2.2.3 with a custom RSME function for loss and metric. Haven't tested it in Keras v2.2.4 yet.
MSE is always fine and works as expected as loss and metric:
K.mean(K.square(y_pred - y_true), axis=-1)

However RMSE,
K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
does not show the correct results.

I usually have MSE and RSME running in either loss or metric and RMSE is not the sqrt of MSE!

K.sqrt(K.mean(K.square(y_pred - y_true), axis=None)) is closer to sqrt(MSE) however not exactly.

Any ideas why this happens or how to further debug this?

I also noticed that none of the standard loss functions are using K.sqrt().

Hmm same issue here with just MSE (not even sqrt). Interestingly, from the official docs at https://keras.io/api/losses/:

def my_loss_fn(y_true, y_pred):
    squared_difference = tf.square(y_true - y_pred)
    return tf.reduce_mean(squared_difference, axis=-1)  # Note the `axis=-1`

model.compile(optimizer='adam', loss=my_loss_fn)

which results in nan for me after ~50 epochs, whereas

model.compile(optimizer='adam', loss='mse')

works without nan. Definitely something odd.

I wonder if it helps to replace y_true - y_pred with tf.subtract(y_true, y_pred).

Was this page helpful?
0 / 5 - 0 ratings