Keras: Custom simple loss function generates unexpected results

Created on 5 Feb 2018 · 5Comments · Source: keras-team/keras

So I implemented a loss function performing RMSE which is simply square root of the built-in mean_squared_error loss as follows ( just copied the source code for mean_squared_error):

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

def rmse(y_true, y_pred):
    return K.sqrt(mean_squared_error(y_true, y_pred))

Then when I train my model using the following code:

model= Sequential()
model.add(Dense(100, input_shape=(9,), activation= 'relu'))
model.add(Dense(1))
model.compile('adam', loss=mean_squared_error, metrics=[rmse])
model.fit(x_tr, y_tr, batch_size=10, epochs=2)

I get the following result:

Epoch 1/2 2356/2356 [==============================] - 2s 884us/step - loss: 2.0065 - rmse: 0.9320 Epoch 2/2 2356/2356 [==============================] - 1s 553us/step - loss: 1.6754 - rmse: 0.8899

As you can see, the loss which is mean_squared_error is not the square of rmse. Is this a bug, or am I missing something?

Source

reza-sohrabi

👍2

Most helpful comment

The display bar averages losses and metrics.

Say batch 1 has a loss (mse, rmse) = (4, 2).
Say batch 2 has a loss (mse, rmse) = (1, 1).

The progress bar after 2 batches would show (2.5, 1.5) which violates the square condition you are expecting but is accurate.

brge17 on 5 Feb 2018

👍4

All 5 comments

The display bar averages losses and metrics.

Say batch 1 has a loss (mse, rmse) = (4, 2).
Say batch 2 has a loss (mse, rmse) = (1, 1).

The progress bar after 2 batches would show (2.5, 1.5) which violates the square condition you are expecting but is accurate.

brge17 on 5 Feb 2018

👍4

Ok, that seems like a good explanation, but shouldn't this issue be resolved as I guess the most reasonable outcome is the one I was expecting. Isn't it better to change the reporting system not to average out batches' losses and instead compute the loss on the ensemble of batches which has been used for training. Certainly the current system works for MSE because of the math, but it will result in unexpected result for custom loss functions.

reza-sohrabi on 5 Feb 2018

The progress bar doesn't show the results of the current batch. It was designed to show the average performance of the epoch.

Even if you ensemble the losses as you propose sqrt(x_1^2 + x_2^2 + ... x_n^2) != (x_1 + x_2 + ... x_n).

brge17 on 5 Feb 2018

👍1

I am not looking for the property you say here:

Even if you ensemble the losses as you propose sqrt(x_1^2 + x_2^2 + ... x_n^2) != (x_1 + x_2 + ... x_n)

I am proposing to concatenate all the batches in the epoch and then calculate the loss for the the whole thing, rather than averaging out the loss of all the batches in the epoch because square root or any other non-linear function ruins the results.
My suggestion makes no difference for mse, or binary_crossentropy if you do the math, but it certainly makes difference for RMSE or other possible custom losses that makes use of a non-linear function at the end of the calculation.

reza-sohrabi on 6 Feb 2018

@brge17 Hope that means loss is calculated per batch (only the current batch) and metric is calculated per epoch (only the current epoch, not average of previous and current epochs) and that
if batch size = training set size, then loss = mse and metric = mean_squared_error must give same value. Please confirm