Keras: Error in Contrastive loss function for the Siamese example

Created on 1 Mar 2016 · 13Comments · Source: keras-team/keras

I'm looking the Siamese example it seems missing a division of 2.0 in the contrastive loss function: ("http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf")

return K.mean(y * K.square(d) + (1 - y) * K.square(K.maximum(margin - d, 0)))/2.0
instead of :
return K.mean(y * K.square(d) + (1 - y) * K.square(K.maximum(margin - d, 0)))

Source

GregorySenay

Most helpful comment

No, because in that paper, they want a value near 0 when two items are similar and 1 when they are not.
In the example, the network is trained to give a value of 1 when two items are similar.

gewoonrik on 29 Jun 2017

👍5

All 13 comments

The 1/2 can be usefull for differenciation because of the square but it won't change the optimization results.

tboquet on 1 Mar 2016

Actually found out something odd about the implementation of the contrastive loss function in the siamese example: I corrected the implementation in my function for contrastive loss according to the paper and https://github.com/fchollet/keras/issues/4980, so that my contrastive loss function now reads like this:

` def contrastive_loss(y_true, y_pred):

margin = 1

#return K.mean(y_true * K.square(y_pred) +
              #(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

return K.mean((1-y_true) * 0.5 * K.square(y_pred) +
              0.5 * y_true * K.square(K.maximum(margin - y_pred, 0)))`

When I run training on my dataset (images, 10 classes, all loaded into memory before splitting into training/test set, like in the siamese example), this is what my logs look like:

`4000/4000 [==============================] - 12s - loss: 0.1521 - acc: 0.4987 - val_loss: 0.2126 - val_acc: 0.5000
Epoch 2/20
4000/4000 [==============================] - 10s - loss: 0.1383 - acc: 0.5088 - val_loss: 0.1825 - val_acc: 0.5080
Epoch 3/20
4000/4000 [==============================] - 9s - loss: 0.1358 - acc: 0.5252 - val_loss: 0.1333 - val_acc: 0.5760
Epoch 4/20
4000/4000 [==============================] - 9s - loss: 0.1202 - acc: 0.6042 - val_loss: 0.1108 - val_acc: 0.6470
Epoch 5/20
4000/4000 [==============================] - 9s - loss: 0.1097 - acc: 0.6480 - val_loss: 0.1059 - val_acc: 0.6720
Epoch 6/20
4000/4000 [==============================] - 9s - loss: 0.1003 - acc: 0.6878 - val_loss: 0.0961 - val_acc: 0.6630
Epoch 7/20
4000/4000 [==============================] - 10s - loss: 0.0968 - acc: 0.6930 - val_loss: 0.1095 - val_acc: 0.6300
Epoch 8/20
4000/4000 [==============================] - 10s - loss: 0.0870 - acc: 0.7342 - val_loss: 0.1082 - val_acc: 0.6270
Epoch 9/20
4000/4000 [==============================] - 10s - loss: 0.0822 - acc: 0.7355 - val_loss: 0.0990 - val_acc: 0.6980
Epoch 10/20
4000/4000 [==============================] - 9s - loss: 0.0771 - acc: 0.7445 - val_loss: 0.0940 - val_acc: 0.6780
Epoch 11/20
4000/4000 [==============================] - 9s - loss: 0.0698 - acc: 0.7598 - val_loss: 0.0988 - val_acc: 0.6990
Epoch 12/20
4000/4000 [==============================] - 9s - loss: 0.0732 - acc: 0.7405 - val_loss: 0.0918 - val_acc: 0.7130
Epoch 13/20
4000/4000 [==============================] - 9s - loss: 0.0657 - acc: 0.7555 - val_loss: 0.1091 - val_acc: 0.6550
Epoch 14/20
4000/4000 [==============================] - 9s - loss: 0.0636 - acc: 0.7545 - val_loss: 0.0917 - val_acc: 0.6920
Epoch 15/20
4000/4000 [==============================] - 10s - loss: 0.0603 - acc: 0.7553 - val_loss: 0.0867 - val_acc: 0.7170
Epoch 16/20
4000/4000 [==============================] - 10s - loss: 0.0585 - acc: 0.7708 - val_loss: 0.1120 - val_acc: 0.6220
Epoch 17/20
4000/4000 [==============================] - 9s - loss: 0.0571 - acc: 0.7627 - val_loss: 0.0868 - val_acc: 0.7280
Epoch 18/20
4000/4000 [==============================] - 9s - loss: 0.0554 - acc: 0.7620 - val_loss: 0.0914 - val_acc: 0.6920
Epoch 19/20
4000/4000 [==============================] - 9s - loss: 0.0536 - acc: 0.7715 - val_loss: 0.0854 - val_acc: 0.7230
Epoch 20/20
4000/4000 [==============================] - 8s - loss: 0.0516 - acc: 0.7737 - val_loss: 0.0947 - val_acc: 0.6960

Accuracy on training set: 17.62%
Accuracy on test set: 24.79%
`

As you can see, the loss decreases nicely, and the accuracy increases as well, but the final values are pretty awful in terms of accuracy metrics.

To check against the original setup for contrastive loss (the commented code in my function definition now is the one that runs), my logs look like this:

`4000/4000 [==============================] - 12s - loss: 0.2582 - acc: 0.5107 - val_loss: 0.2503 - val_acc: 0.5000
Epoch 2/20
4000/4000 [==============================] - 10s - loss: 0.2534 - acc: 0.5100 - val_loss: 0.2500 - val_acc: 0.5000
Epoch 3/20
4000/4000 [==============================] - 10s - loss: 0.2531 - acc: 0.5048 - val_loss: 0.2509 - val_acc: 0.5000
Epoch 4/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4962 - val_loss: 0.2500 - val_acc: 0.5000
Epoch 5/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4993 - val_loss: 0.2505 - val_acc: 0.5000
Epoch 6/20
4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.4907 - val_loss: 0.2528 - val_acc: 0.5000
Epoch 7/20
4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.5062 - val_loss: 0.2507 - val_acc: 0.5000
Epoch 8/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.5110 - val_loss: 0.2503 - val_acc: 0.5000
Epoch 9/20
4000/4000 [==============================] - 10s - loss: 0.2516 - acc: 0.5012 - val_loss: 0.2498 - val_acc: 0.5080
Epoch 10/20
4000/4000 [==============================] - 10s - loss: 0.2452 - acc: 0.4632 - val_loss: 0.2343 - val_acc: 0.4270
Epoch 11/20
4000/4000 [==============================] - 10s - loss: 0.2420 - acc: 0.4465 - val_loss: 0.2375 - val_acc: 0.4370
Epoch 12/20
4000/4000 [==============================] - 10s - loss: 0.2297 - acc: 0.4138 - val_loss: 0.2311 - val_acc: 0.4100
Epoch 13/20
4000/4000 [==============================] - 10s - loss: 0.2203 - acc: 0.3795 - val_loss: 0.2248 - val_acc: 0.3850
Epoch 14/20
4000/4000 [==============================] - 10s - loss: 0.2100 - acc: 0.3472 - val_loss: 0.2172 - val_acc: 0.3320
Epoch 15/20
4000/4000 [==============================] - 10s - loss: 0.2015 - acc: 0.3197 - val_loss: 0.2110 - val_acc: 0.3420
Epoch 16/20
4000/4000 [==============================] - 10s - loss: 0.1880 - acc: 0.2850 - val_loss: 0.2219 - val_acc: 0.3260
Epoch 17/20
4000/4000 [==============================] - 10s - loss: 0.1805 - acc: 0.2715 - val_loss: 0.2003 - val_acc: 0.2960
Epoch 18/20
4000/4000 [==============================] - 10s - loss: 0.1695 - acc: 0.2440 - val_loss: 0.1979 - val_acc: 0.3010
Epoch 19/20
4000/4000 [==============================] - 10s - loss: 0.1610 - acc: 0.2320 - val_loss: 0.2021 - val_acc: 0.2760
Epoch 20/20
4000/4000 [==============================] - 10s - loss: 0.1554 - acc: 0.2175 - val_loss: 0.1855 - val_acc: 0.2740

Accuracy on training set: 73.81%
Accuracy on test set: 69.76%
`

Neither the loss nor the accuracy looks as good as the previous setup, but the final performance here is much better. Does anyone have an explanation for this?

stalagmite7 on 5 May 2017

the keras default accuracy function expects:
if the label is 1, then the output label should be 1.
However, this example works like this:
if the label is 1, the output should be low (near zero). That's why the keras accuracy and the custom accuracy functions are returning inverse values.

I use this custom accuracy function during training:

def acc(y_true, y_pred):
    ones = K.ones_like(y_pred)
    return K.mean(K.equal(y_true, ones - K.clip(K.round(y_pred), 0, 1)), axis=-1)

model.compile(.... metrics=[acc])

gewoonrik on 11 May 2017

❤1

I have a doubt:
Should it be return K.mean((1-y) * K.square(d) + y * K.square(K.maximum(margin - d, 0)))/2.0 instead of return K.mean((1-y) * K.square(d) + y * K.square(K.maximum(margin - d, 0)))/2.0. (Interchanging the y and 1-y as mentioned in the (paper)[http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf])

pvskand on 29 Jun 2017

No, because in that paper, they want a value near 0 when two items are similar and 1 when they are not.
In the example, the network is trained to give a value of 1 when two items are similar.

gewoonrik on 29 Jun 2017

👍5

Aah! Okay! That makes sense now! Thanks :)

pvskand on 29 Jun 2017

😄1

Shouldn't the loss be defined without the encapsulating K.mean(), so that it would support the use of class/sample weights?

markloyman on 19 Sep 2017

@gewoonrik @pvskand
Thanks. Does i use the Contrastive loss function in classification task instead of categorical_crossentropy？

alyato on 22 May 2018

Contrastive Loss is used to bring representations of similar labels near and that of the different labels far apart. So if the classification task is binary yes you can use it instead of the categorical_crossentropy (since you have the margin that decides the label to be 0/1) or else you can additionally have the categorical_crossentropy loss along with the Contrastive Loss.

pvskand on 22 May 2018

@pvskand
Thanks for your replying.
I will try the categorical_crossentropy+Contrastive Loss .

alyato on 22 May 2018

👍1

hi @pvskand
i have on question about accuracy
the following loss_function is

def myloss(y_true,y_pred,e=0.1):
margin = 1
loss_1 = K.categorical_crossentropy(y_true,y_pred)
loss_0 = K.mean(y_true * K.square(y_pred)+(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
loss = loss_0+loss_1
return loss

Does i using the function of acc?

def acc(y_true, y_pred):
ones = K.ones_like(y_pred)
return K.mean(K.equal(y_true, ones - K.clip(K.round(y_pred), 0, 1)), axis=-1)

Thanks.

alyato on 28 May 2018

@alyato it depends which loss would you like to use for the accuracy, either the cross_entropy or the contrastive_loss. Based on the loss function you define your accuracy.

pvskand on 28 May 2018

Closing as this is resolved