Keras: How to implement a Siamese network architecture？

Created on 18 Jun 2015 · 10Comments · Source: keras-team/keras

How to implement the Siamese architecture in S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. CVPR, 2005.
I means how to train two network with the same weights use keras.
Any help is appreciate! Thanks

Source

xuzhm

Most helpful comment

@iskandr There is a single f_theta in this approach, which is uniformly applied to each instance during the forward pass. I applied this to authorship verification (i.e. check whether two documents were written by the same author). Suppose that we have documents A and B by author 1, and documents C and D by author two. This situation yields the following pairs as data:

(A, B) = 1 (same author)
(C, D) = 1
(A, C) = 0 (different author)
(A, D) = 0
(B, C) = 0
(B, D) = 0

My X_train would be a concatenation of these pairs, with the first item always at an uneven index; the y_train would have a doubled version of the class labels:

X_train = [A, B, C, D, A, C, A, D, B, C, B, D]
y_train = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]

By applying a standard forward model (f_theta, parametrized by a single theta) to this data, you would get an output representation for each document. To calculate the loss, you exploit the positional regularity with the uneven and uneven indices with e.g.:

def siamese_euclidean(y_true, y_pred):
    a = y_pred[0::2]
    b = y_pred[1::2]
    diff = ((a - b) ** 2).sum(axis=1, keepdims=True)
    y_true = y_true[0::2]
    return ((diff - y_true)**2).mean()

I don't see how this is different from standard siamese networks?

mikekestemont on 25 Jun 2015

👍3

All 10 comments

Since a Siamese architecture only has a single output, I think a Merge should be sufficient to implement it.

However, Merge does not currently work with repeated instances of the same model ("same weights"). We're discussing this problem in https://github.com/fchollet/keras/issues/224#issuecomment-114632044 and @pdermyer may have a solution (waiting for a PR).

iskandr on 24 Jun 2015

In the recent past, I tried something similar with keras, following up on this clever idea:
https://github.com/Lasagne/Lasagne/issues/168. Therefore, you can simply add a custom siamese objective to objectives.py, such as:

def siamese_euclidean(y_true, y_pred):
    a = y_pred[0::2]
    b = y_pred[1::2]
    diff = ((a - b) ** 2).sum(axis=1, keepdims=True)
    y_true = y_true[0::2]
    return ((diff - y_true)**2).mean()

This approach makes it easy to share the weights in a vector pair. Watch out with setting the batch_size though (even numbers!) to make sure that the indices are respected.

mikekestemont on 24 Jun 2015

@mikekestemont I'm probably wrong, but this modified objective seems a little different to me than what people usually want from a Siamese network. Won't this network have more independent parameters to generate 2x as many outputs? I thought usually what's desired is a single model f_theta which is trained to optimize distance(f_theta(x), f_theta(y)) for training pairs (x,y). In your case, won't there be a separate theta_x and theta_y in the last layer of the network?

iskandr on 24 Jun 2015

A ``share_weight'' parameter to choose whether share weights between layers would be a better solution. This is very helpful for other architectures.

donglixp on 25 Jun 2015

(A, B) = 1 (same author)
(C, D) = 1
(A, C) = 0 (different author)
(A, D) = 0
(B, C) = 0
(B, D) = 0

My X_train would be a concatenation of these pairs, with the first item always at an uneven index; the y_train would have a doubled version of the class labels:

X_train = [A, B, C, D, A, C, A, D, B, C, B, D]
y_train = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]

def siamese_euclidean(y_true, y_pred):
    a = y_pred[0::2]
    b = y_pred[1::2]
    diff = ((a - b) ** 2).sum(axis=1, keepdims=True)
    y_true = y_true[0::2]
    return ((diff - y_true)**2).mean()

I don't see how this is different from standard siamese networks?

mikekestemont on 25 Jun 2015

👍3

@mikekestemont Thanks for spelling it out for me, I misunderstood how the indexing trick was working. This does seem to implement a Siamese network!

iskandr on 25 Jun 2015

👍1

Sorry for commenting on a closed issue, it wasn't clear to me if the solution is just adding the same model as left and right leg of the merge layer. Is the following going to implement a siamese network?

siamese_leg = Sequential()
left.add(...)
left.add(...)

model = Sequential()
model.add(Merge([leg, leg], mode='concat'))
model.add(Dense(50, 10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

Are the two "legs" going to share the same weights?

Thanks.

ghost on 17 Jul 2015

👍1

@gbcalsaverini Yes, the "legs" or "heads" of a share weights (in practice there is only one model that you're training on pairs of inputs). The loss function you gave won't work to achieve training based on the similarity of the two inputs (since you're penalizing classification error across 10 categories, when what you want is to push similar inputs toward similar outputs).

iskandr on 17 Jul 2015

Ah, don't mind the rest of the code, I just copied from the example from merge and didn't bother to change it.

I couldn't find though any layer that would implement something like an euclidean distance or dot product of the entries being merged. Is there one?

Shouldn't the output of the siamese network be something like the euclidean distance between the representation vectors?

Thanks.

ghost on 18 Jul 2015

👎2

@ghost,so you are not a ghost..