Keras: NaN loss when using euclidean distance for siamese network (due to K.sqrt)

Created on 17 Jan 2018 · 4Comments · Source: keras-team/keras

When I'm training a siamese network, I come across the following code and use it in my program:

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

But it turns out nan loss when two inputs are the same. (Interestingly, it can be only discovered when GPU is not used. When GPU is used, the loss won't be nan but the network is not trainable.) However, when I add small dummy bias into it, the network can work.

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True) + 0.01)

I'm just wondering why K.sqrt() cannot take zero tensor as input.

Btw, I'm using Keras 2.1.2 and tensorflow-gpu 1.4.0.

Source

Mel-Peng

Most helpful comment

Maybe it's because derivative of sqrt(x) is infinite at 0? Does it happen during evaluate() or only during fit()?

ozabluda on 31 Jan 2018

👍4

All 4 comments

It is usually related to numerical stability. However, what is more interesting for me is -- why your Siamese Network training would meet the strange case -- two inputs are identical.

I double checked the source code of K.sqrt for both tensorflow and theano implementations, but both have already clipped an input to non-neg values. So K.sqrt should take zero tensors as inputs.

def sqrt(x):
    """Element-wise square root.
    # Arguments
        x: Tensor or variable.
    # Returns
        A tensor.
    """
    zero = _to_tensor(0., x.dtype.base_dtype)
    inf = _to_tensor(np.inf, x.dtype.base_dtype)
    x = tf.clip_by_value(x, zero, inf)
    return tf.sqrt(x)


def sqrt(x):
    x = T.clip(x, 0., np.inf)
    return T.sqrt(x)

rex-yue-wu on 27 Jan 2018

👍1

@rex-yue-wu Thank you for your reply! My data set is generated through images and some kind of spatial transformations, I didn't realize my data set contains such case until I found the problem.

I directly apply sqrt function on zero tensor, like you say, it won't generate any problem. But it still comes out through the training...... But for now I just remove this case since identical input images won't generate any loss.

Mel-Peng on 30 Jan 2018

Maybe it's because derivative of sqrt(x) is infinite at 0? Does it happen during evaluate() or only during fit()?

ozabluda on 31 Jan 2018

👍4

@ozabluda I think you are right ! It happens only during fit(). So the problem is caused by an infinite gradient which comes from the special case.

Thanks for all your help :")

Mel-Peng on 31 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings