When we define our own loss function in keras2, such as :
def l1_smooth_loss(y_true, y_pred):
x = K.abs(y_true - y_pred)
x = tf.where(tf.less(x, 1.0), 0.5 * x ** 2, x - 0.5)
return K.sum(x, axis=-1)
Dose the loss function return a scalar or a tensor of shape (batch_size? , dimension_per_sample_reduce_to_1)?
In keras losses.py, the defined losses return a tensor, but in the example document, the defined losses return a scalar.
Hi. This might answer your question, if I understand it correctly: From file:
https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py
def sum(x, axis=None, keepdims=False):
"""Sum of the values in a tensor, alongside the specified axis.
# Arguments
x: A tensor or variable.
axis: An integer, the axis to sum over.
keepdims: A boolean, whether to keep the dimensions or not.
If keepdims is False, the rank of the tensor is reduced
by 1. If keepdims is True,
the reduced dimension is retained with length 1.
# Returns
A tensor with sum of x.
"""
axis = _normalize_axis(axis, ndim(x))
return tf.reduce_sum(x, reduction_indices=axis, keep_dims=keepdims)
Hope that helps. Thanks.
@td2014
Thank you for you replying!
What you have explain is exactly where the issues rely. As for K.mean or tf. reduce_mean, the loss results are the same no matter whether I set axis=-1 or axis=None. It's so strange.
The fellowing two function act different results when it comes to K.sum. And it's reasonable.
def l1_smooth_loss(y_true, y_pred):
x = K.abs(y_true - y_pred)
x = tf.where(tf.less(x, 1.0), 0.5 * x ** 2, x - 0.5)
return K.sum(x)
def l1_smooth_loss(y_true, y_pred):
x = K.abs(y_true - y_pred)
x = tf.where(tf.less(x, 1.0), 0.5 * x ** 2, x - 0.5)
return K.sum(x, axis=-1)
Hence, I still can't figure out the problem. For example, from keras/losses.py
def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
The loss return a tensor, because of the setting axis=-1 which is different from your explain if I understand correctly.
I did a quick bit of checking, and it seems that the tf sum and mean functions are equivalent to numpy. So, I did a quick code up:
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
b = np.sum(a)
b_m1 = np.sum(a,axis=-1)
b_None = np.sum(a,axis=None)
b_p1 = np.sum(a,axis=1)
c = np.mean(a)
c_m1 = np.mean(a,axis=-1)
c_None = np.mean(a,axis=None)
c_p1 = np.mean(a,axis=1)
---> . Numpy treats -1 the same as +1. Therefore, if you specify "None" or nothing, you get a sum over all the entries. If you use -1, you get a sum over dimension 1 (same as if you said axis=1). In other words, axis=-1 is the same as axis=1. When axis=-1, you should get back an array, and when axis=None, you should get back a single number. Is that what you are seeing? Thanks.
Than you very much.
I understand the function of axis=-1 in sum & mean. My issues are:
There is apparently a function called before the tensor gets paased to TensorFlow: axis = _normalize_axis(axis, ndim(x)) which is in the backend interface https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py
def _normalize_axis(axis, ndim):
"""Converts negative axes to positive values.
# Arguments
axis: Integer axis (possibly negative).
ndim: Rank of the tensor considered.
# Returns
Positive integer axis.
"""
if isinstance(axis, tuple):
axis = list(axis)
if isinstance(axis, list):
for i, a in enumerate(axis):
if a is not None and a < 0:
axis[i] = a % ndim
else:
if axis is not None and axis < 0:
axis %= ndim
return axis
If you look at the very last "if" statement, it does a modulo operation with ndim. So, if ndim=1, then -1%1 = 0, which means the operation will be over the entire batch and give you a single number, if I understand correctly.
Thank you for you replying, but I'm afraid I didn't understand correctly. Take the mean_absolute_error loss function which is in https://github.com/fchollet/keras/blob/master/keras/losses.py for example:
def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
Supposing y_pred has the shape of (batch_size, dimension_per_sample), the axis=-1 was passed to _normalize_axis(axis, ndim(x)) and it will return 1 (-1%2=1, noticing that the ndim(y_pred) = 2).
So, in the end, K.mean(K.abs(y_pred - y_true), axis=-1) will calculate over axis=1 not over the whole batch in which case axis=1.
Or the 'mse' loss defined in keras could only handle one output situation rather than multi output of a regress network?
OK, we can find the mechanism in https://github.com/fchollet/keras/blob/master/keras/engine/training.py
model.compile :
sample_weight_mode: if you need to do timestep-wise
sample weighting (2D weights), set this to "temporal".
None defaults to sample-wise weights (1D).
If the model has multiple outputs, you can use a different
sample_weight_mode on each output by passing a
dictionary or a list of modes.
So, in my opinion, the engine will calculate over the whole batch for us. Although, I'm not quite sure.
Yes, I agree. I did some more exploration, and if you look at the details of what model.compile does during set-up for total loss calculation, it ends up calling the function (in training.py): _weighted_masked_objective(fn):
The last two lines are:
return K.mean(score_array)
return weighted
Therefore, this function will compute the mean loss over the score_array (which is 1-dimensional, just the batch dimension) and return that. Thanks.
Thank you for your help.
Best wishes! : )
Thanks for bringing up an interesting exercise to work through. Best wishes to you too!
good discussion.It solve my puzzle.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Thanks for your discussion. Both of you save my life, bravo! @td2014 @hellojialee
Most helpful comment
Yes, I agree. I did some more exploration, and if you look at the details of what model.compile does during set-up for total loss calculation, it ends up calling the function (in training.py): _weighted_masked_objective(fn):
The last two lines are:
return K.mean(score_array)
return weighted
Therefore, this function will compute the mean loss over the score_array (which is 1-dimensional, just the batch dimension) and return that. Thanks.