Keras: Define a loss funciton

Created on 20 Jun 2017 · 13Comments · Source: keras-team/keras

When we define our own loss function in keras2, such as :

Define the smooth L1 loss

def l1_smooth_loss(y_true, y_pred):
x = K.abs(y_true - y_pred)
x = tf.where(tf.less(x, 1.0), 0.5 * x ** 2, x - 0.5)
return K.sum(x, axis=-1)

Dose the loss function return a scalar or a tensor of shape (batch_size? , dimension_per_sample_reduce_to_1)?

In keras losses.py, the defined losses return a tensor, but in the example document, the defined losses return a scalar.

stale

Source

hellojialee

Most helpful comment

Yes, I agree. I did some more exploration, and if you look at the details of what model.compile does during set-up for total loss calculation, it ends up calling the function (in training.py): _weighted_masked_objective(fn):
The last two lines are:
return K.mean(score_array)
return weighted

Therefore, this function will compute the mean loss over the score_array (which is 1-dimensional, just the batch dimension) and return that. Thanks.

td2014 on 22 Jun 2017

👍4

All 13 comments

Hi. This might answer your question, if I understand it correctly: From file:
https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py

def sum(x, axis=None, keepdims=False):
"""Sum of the values in a tensor, alongside the specified axis.
# Arguments
x: A tensor or variable.
axis: An integer, the axis to sum over.
keepdims: A boolean, whether to keep the dimensions or not.
If keepdims is False, the rank of the tensor is reduced
by 1. If keepdims is True,
the reduced dimension is retained with length 1.
# Returns
A tensor with sum of x.
"""
axis = _normalize_axis(axis, ndim(x))
return tf.reduce_sum(x, reduction_indices=axis, keep_dims=keepdims)

Hope that helps. Thanks.

td2014 on 20 Jun 2017

👍1

@td2014
Thank you for you replying!
What you have explain is exactly where the issues rely. As for K.mean or tf. reduce_mean, the loss results are the same no matter whether I set axis=-1 or axis=None. It's so strange.

The fellowing two function act different results when it comes to K.sum. And it's reasonable.

case1

def l1_smooth_loss(y_true, y_pred):
x = K.abs(y_true - y_pred)
x = tf.where(tf.less(x, 1.0), 0.5 * x ** 2, x - 0.5)
return K.sum(x)

case2

def l1_smooth_loss(y_true, y_pred):
x = K.abs(y_true - y_pred)
x = tf.where(tf.less(x, 1.0), 0.5 * x ** 2, x - 0.5)
return K.sum(x, axis=-1)

Hence, I still can't figure out the problem. For example, from keras/losses.py

mean_absolute_error

def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)

The loss return a tensor, because of the setting axis=-1 which is different from your explain if I understand correctly.

hellojialee on 20 Jun 2017

I did a quick bit of checking, and it seems that the tf sum and mean functions are equivalent to numpy. So, I did a quick code up:

import numpy as np

a = np.array([[1,2,3],[4,5,6]])

b = np.sum(a)
b_m1 = np.sum(a,axis=-1)
b_None = np.sum(a,axis=None)
b_p1 = np.sum(a,axis=1)

c = np.mean(a)
c_m1 = np.mean(a,axis=-1)
c_None = np.mean(a,axis=None)
c_p1 = np.mean(a,axis=1)

---> . Numpy treats -1 the same as +1. Therefore, if you specify "None" or nothing, you get a sum over all the entries. If you use -1, you get a sum over dimension 1 (same as if you said axis=1). In other words, axis=-1 is the same as axis=1. When axis=-1, you should get back an array, and when axis=None, you should get back a single number. Is that what you are seeing? Thanks.

td2014 on 20 Jun 2017

👎2

Than you very much.
I understand the function of axis=-1 in sum & mean. My issues are:

When we define a loss function in keras, dose it return a Tensor whose shape is (bath_size, ?) or just a scalar summing or averaging the whole batch?
The defined losses in keras/losses.py only get a sum over one dimension (axis=-1 in these source code) while didn't get a sum over all bath size examples.

hellojialee on 20 Jun 2017

There is apparently a function called before the tensor gets paased to TensorFlow: axis = _normalize_axis(axis, ndim(x)) which is in the backend interface https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py

def _normalize_axis(axis, ndim):
"""Converts negative axes to positive values.
# Arguments
axis: Integer axis (possibly negative).
ndim: Rank of the tensor considered.
# Returns
Positive integer axis.
"""
if isinstance(axis, tuple):
axis = list(axis)
if isinstance(axis, list):
for i, a in enumerate(axis):
if a is not None and a < 0:
axis[i] = a % ndim
else:
if axis is not None and axis < 0:
axis %= ndim
return axis

If you look at the very last "if" statement, it does a modulo operation with ndim. So, if ndim=1, then -1%1 = 0, which means the operation will be over the entire batch and give you a single number, if I understand correctly.

td2014 on 20 Jun 2017

Thank you for you replying, but I'm afraid I didn't understand correctly. Take the mean_absolute_error loss function which is in https://github.com/fchollet/keras/blob/master/keras/losses.py for example:

def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)

Supposing y_pred has the shape of (batch_size, dimension_per_sample), the axis=-1 was passed to _normalize_axis(axis, ndim(x)) and it will return 1 (-1%2=1, noticing that the ndim(y_pred) = 2).
So, in the end, K.mean(K.abs(y_pred - y_true), axis=-1) will calculate over axis=1 not over the whole batch in which case axis=1.

Or the 'mse' loss defined in keras could only handle one output situation rather than multi output of a regress network?

hellojialee on 21 Jun 2017

OK, we can find the mechanism in https://github.com/fchollet/keras/blob/master/keras/engine/training.py
model.compile :
sample_weight_mode: if you need to do timestep-wise
sample weighting (2D weights), set this to "temporal".
None defaults to sample-wise weights (1D).
If the model has multiple outputs, you can use a different
sample_weight_mode on each output by passing a
dictionary or a list of modes.

So, in my opinion, the engine will calculate over the whole batch for us. Although, I'm not quite sure.

hellojialee on 21 Jun 2017

Therefore, this function will compute the mean loss over the score_array (which is 1-dimensional, just the batch dimension) and return that. Thanks.

td2014 on 22 Jun 2017

👍4

Thank you for your help.
Best wishes! : )

hellojialee on 24 Jun 2017

Thanks for bringing up an interesting exercise to work through. Best wishes to you too!

td2014 on 24 Jun 2017

good discussion.It solve my puzzle.

JiaqingFu on 8 Aug 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.