I'm looking the Siamese example it seems missing a division of 2.0 in the contrastive loss function: ("http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf")
return K.mean(y * K.square(d) + (1 - y) * K.square(K.maximum(margin - d, 0)))/2.0
instead of :
return K.mean(y * K.square(d) + (1 - y) * K.square(K.maximum(margin - d, 0)))
The 1/2 can be usefull for differenciation because of the square but it won't change the optimization results.
Actually found out something odd about the implementation of the contrastive loss function in the siamese example: I corrected the implementation in my function for contrastive loss according to the paper and https://github.com/fchollet/keras/issues/4980, so that my contrastive loss function now reads like this:
` def contrastive_loss(y_true, y_pred):
margin = 1
#return K.mean(y_true * K.square(y_pred) +
#(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
return K.mean((1-y_true) * 0.5 * K.square(y_pred) +
0.5 * y_true * K.square(K.maximum(margin - y_pred, 0)))`
When I run training on my dataset (images, 10 classes, all loaded into memory before splitting into training/test set, like in the siamese example), this is what my logs look like:
`4000/4000 [==============================] - 12s - loss: 0.1521 - acc: 0.4987 - val_loss: 0.2126 - val_acc: 0.5000
Epoch 2/20
4000/4000 [==============================] - 10s - loss: 0.1383 - acc: 0.5088 - val_loss: 0.1825 - val_acc: 0.5080
Epoch 3/20
4000/4000 [==============================] - 9s - loss: 0.1358 - acc: 0.5252 - val_loss: 0.1333 - val_acc: 0.5760
Epoch 4/20
4000/4000 [==============================] - 9s - loss: 0.1202 - acc: 0.6042 - val_loss: 0.1108 - val_acc: 0.6470
Epoch 5/20
4000/4000 [==============================] - 9s - loss: 0.1097 - acc: 0.6480 - val_loss: 0.1059 - val_acc: 0.6720
Epoch 6/20
4000/4000 [==============================] - 9s - loss: 0.1003 - acc: 0.6878 - val_loss: 0.0961 - val_acc: 0.6630
Epoch 7/20
4000/4000 [==============================] - 10s - loss: 0.0968 - acc: 0.6930 - val_loss: 0.1095 - val_acc: 0.6300
Epoch 8/20
4000/4000 [==============================] - 10s - loss: 0.0870 - acc: 0.7342 - val_loss: 0.1082 - val_acc: 0.6270
Epoch 9/20
4000/4000 [==============================] - 10s - loss: 0.0822 - acc: 0.7355 - val_loss: 0.0990 - val_acc: 0.6980
Epoch 10/20
4000/4000 [==============================] - 9s - loss: 0.0771 - acc: 0.7445 - val_loss: 0.0940 - val_acc: 0.6780
Epoch 11/20
4000/4000 [==============================] - 9s - loss: 0.0698 - acc: 0.7598 - val_loss: 0.0988 - val_acc: 0.6990
Epoch 12/20
4000/4000 [==============================] - 9s - loss: 0.0732 - acc: 0.7405 - val_loss: 0.0918 - val_acc: 0.7130
Epoch 13/20
4000/4000 [==============================] - 9s - loss: 0.0657 - acc: 0.7555 - val_loss: 0.1091 - val_acc: 0.6550
Epoch 14/20
4000/4000 [==============================] - 9s - loss: 0.0636 - acc: 0.7545 - val_loss: 0.0917 - val_acc: 0.6920
Epoch 15/20
4000/4000 [==============================] - 10s - loss: 0.0603 - acc: 0.7553 - val_loss: 0.0867 - val_acc: 0.7170
Epoch 16/20
4000/4000 [==============================] - 10s - loss: 0.0585 - acc: 0.7708 - val_loss: 0.1120 - val_acc: 0.6220
Epoch 17/20
4000/4000 [==============================] - 9s - loss: 0.0571 - acc: 0.7627 - val_loss: 0.0868 - val_acc: 0.7280
Epoch 18/20
4000/4000 [==============================] - 9s - loss: 0.0554 - acc: 0.7620 - val_loss: 0.0914 - val_acc: 0.6920
Epoch 19/20
4000/4000 [==============================] - 9s - loss: 0.0536 - acc: 0.7715 - val_loss: 0.0854 - val_acc: 0.7230
Epoch 20/20
4000/4000 [==============================] - 8s - loss: 0.0516 - acc: 0.7737 - val_loss: 0.0947 - val_acc: 0.6960
As you can see, the loss decreases nicely, and the accuracy increases as well, but the final values are pretty awful in terms of accuracy metrics.
To check against the original setup for contrastive loss (the commented code in my function definition now is the one that runs), my logs look like this:
`4000/4000 [==============================] - 12s - loss: 0.2582 - acc: 0.5107 - val_loss: 0.2503 - val_acc: 0.5000
Epoch 2/20
4000/4000 [==============================] - 10s - loss: 0.2534 - acc: 0.5100 - val_loss: 0.2500 - val_acc: 0.5000
Epoch 3/20
4000/4000 [==============================] - 10s - loss: 0.2531 - acc: 0.5048 - val_loss: 0.2509 - val_acc: 0.5000
Epoch 4/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4962 - val_loss: 0.2500 - val_acc: 0.5000
Epoch 5/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4993 - val_loss: 0.2505 - val_acc: 0.5000
Epoch 6/20
4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.4907 - val_loss: 0.2528 - val_acc: 0.5000
Epoch 7/20
4000/4000 [==============================] - 10s - loss: 0.2519 - acc: 0.5062 - val_loss: 0.2507 - val_acc: 0.5000
Epoch 8/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.5110 - val_loss: 0.2503 - val_acc: 0.5000
Epoch 9/20
4000/4000 [==============================] - 10s - loss: 0.2516 - acc: 0.5012 - val_loss: 0.2498 - val_acc: 0.5080
Epoch 10/20
4000/4000 [==============================] - 10s - loss: 0.2452 - acc: 0.4632 - val_loss: 0.2343 - val_acc: 0.4270
Epoch 11/20
4000/4000 [==============================] - 10s - loss: 0.2420 - acc: 0.4465 - val_loss: 0.2375 - val_acc: 0.4370
Epoch 12/20
4000/4000 [==============================] - 10s - loss: 0.2297 - acc: 0.4138 - val_loss: 0.2311 - val_acc: 0.4100
Epoch 13/20
4000/4000 [==============================] - 10s - loss: 0.2203 - acc: 0.3795 - val_loss: 0.2248 - val_acc: 0.3850
Epoch 14/20
4000/4000 [==============================] - 10s - loss: 0.2100 - acc: 0.3472 - val_loss: 0.2172 - val_acc: 0.3320
Epoch 15/20
4000/4000 [==============================] - 10s - loss: 0.2015 - acc: 0.3197 - val_loss: 0.2110 - val_acc: 0.3420
Epoch 16/20
4000/4000 [==============================] - 10s - loss: 0.1880 - acc: 0.2850 - val_loss: 0.2219 - val_acc: 0.3260
Epoch 17/20
4000/4000 [==============================] - 10s - loss: 0.1805 - acc: 0.2715 - val_loss: 0.2003 - val_acc: 0.2960
Epoch 18/20
4000/4000 [==============================] - 10s - loss: 0.1695 - acc: 0.2440 - val_loss: 0.1979 - val_acc: 0.3010
Epoch 19/20
4000/4000 [==============================] - 10s - loss: 0.1610 - acc: 0.2320 - val_loss: 0.2021 - val_acc: 0.2760
Epoch 20/20
4000/4000 [==============================] - 10s - loss: 0.1554 - acc: 0.2175 - val_loss: 0.1855 - val_acc: 0.2740
Neither the loss nor the accuracy looks as good as the previous setup, but the final performance here is much better. Does anyone have an explanation for this?
the keras default accuracy function expects:
if the label is 1, then the output label should be 1.
However, this example works like this:
if the label is 1, the output should be low (near zero). That's why the keras accuracy and the custom accuracy functions are returning inverse values.
I use this custom accuracy function during training:
def acc(y_true, y_pred):
ones = K.ones_like(y_pred)
return K.mean(K.equal(y_true, ones - K.clip(K.round(y_pred), 0, 1)), axis=-1)
model.compile(.... metrics=[acc])
I have a doubt:
Should it be return K.mean((1-y) * K.square(d) + y * K.square(K.maximum(margin - d, 0)))/2.0 instead of return K.mean((1-y) * K.square(d) + y * K.square(K.maximum(margin - d, 0)))/2.0. (Interchanging the y and 1-y as mentioned in the (paper)[http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf])
No, because in that paper, they want a value near 0 when two items are similar and 1 when they are not.
In the example, the network is trained to give a value of 1 when two items are similar.
Aah! Okay! That makes sense now! Thanks :)
Shouldn't the loss be defined without the encapsulating K.mean(), so that it would support the use of class/sample weights?
@gewoonrik @pvskand
Thanks. Does i use the Contrastive loss function in classification task instead of categorical_crossentropy?
Contrastive Loss is used to bring representations of similar labels near and that of the different labels far apart. So if the classification task is binary yes you can use it instead of the categorical_crossentropy (since you have the margin that decides the label to be 0/1) or else you can additionally have the categorical_crossentropy loss along with the Contrastive Loss.
@pvskand
Thanks for your replying.
I will try the categorical_crossentropy+Contrastive Loss .
hi @pvskand
i have on question about accuracy
the following loss_function is
def myloss(y_true,y_pred,e=0.1):
margin = 1
loss_1 = K.categorical_crossentropy(y_true,y_pred)
loss_0 = K.mean(y_true * K.square(y_pred)+(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
loss = loss_0+loss_1
return loss
Does i using the function of acc?
def acc(y_true, y_pred):
ones = K.ones_like(y_pred)
return K.mean(K.equal(y_true, ones - K.clip(K.round(y_pred), 0, 1)), axis=-1)
Thanks.
@alyato it depends which loss would you like to use for the accuracy, either the cross_entropy or the contrastive_loss. Based on the loss function you define your accuracy.
Closing as this is resolved
Most helpful comment
No, because in that paper, they want a value near 0 when two items are similar and 1 when they are not.
In the example, the network is trained to give a value of 1 when two items are similar.