What's the best way to implement a margin-based ranking loss like the one described in [1] in keras? So far, I have used either the dot operation of the Merge layer or the siamese architecture described in #242 to calculate the similarity between two inputs. I am unsure how to extend these (or use another approach) to take into consider a corrupted pair of inputs. Any help is greatly appreciated.
[1] Bordes, A., Usunier, N., Weston, J., & Yakhnenko, O. (2013). Translating Embeddings for Modeling Multi-Relational Data. Advances in NIPS, 26, 2787–2795.
Hi @sebastianruder There are no diagrams in that paper and I hate reading raw text. Could you please elaborate about the required model? Like whats your input and output? What do you mean by corrupted pair?
My inputs are basically two pairs of documents (in the paper, these are two pairs of triples). One is a pair of similar documents _a_ and _b_, the other one is a pair of dissimilar documents _a_ and _c_. I want to minimize the dissimilarity between _a_ and _b_ while maximizing the dissimilarity between _a_ and _c_ with a margin (see equation 1 on page 3 of the paper). Is there a way to have a loss function that can take these pairs (or their similarity/dissimilarity scores as input), rather than just the predicted and gold labels?
In other words, what's the best way to minimize any kind of ranking criterion using keras (e.g. in [1], 4.2)?
[1] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 12(Aug), 2493–2537. Retrieved from http://arxiv.org/abs/1103.0398
The simplest way to build a reranker in keras is basically to classify as 1 the positive examples and as 0 the corrupted. In practice you train a classifier and you use its prediction to rerank. It is not the best option at all but in many case it works really well ;) You can try from there, i thought to an implementation of a pairwise reranker but i not reached any solution without disrupting keras.
Thanks for the tip! Yes, that's basically what I've been doing as well. Thought that there was a way to more closely replicate the objective given in the paper. I guess I'll stick with my current objective then.
Is there any progress or any sample code to realize the ranking loss now?
I find it's very common in Multi-NLP tasks. Many papers use the loss to optimize the results. But I can't practice with keras.
Thanks.
I've been attempting to accomplish this using a custom loss function. Essentially, after a
Merge( [doc_input_a, doc_input_b], mode='dot, dot_axis=-1)
I then compile with something like
model.compile( loss=rank_svm_objective, optimizer='adam', class_mode='binary' )
where rank_svm_objective is defined as
def rank_svm_objective( y_true, y_pred, margin=1.0):
# y_pred are the dot product similarities, in interleaved form (positive example, negative example, ...)
# y_true is simply +1, -1, +1, -1
signed = y_pred * y_true # we do this, just so that y_true is part of the computational graph
pos = signed[0::2]
neg = signed[1::2]
# negative samples are multiplied by -1, so that the sign in the rankSVM objective is flipped below
rank_hinge_loss = K.mean( K.relu( margin - pos - neg ), axis=-1)
return rank_hinge_loss
where y_pred is the output of the Merged dot product, and I've fed the data in the minibatch in an interleaved ordering of positive, negative, etc., examples. Furthermore, y_true, is just +1, -1, etc.
Unfortunately, this doesn't seem to work. There's maybe something simple that I'm missing here, but I've gotten similar tricks (defining the objective over interleaved elements of the minibatch) to work with tensorflow.
@sebastianruder I think the code from https://github.com/maciejkula/triplet_recommendations_keras is really a good example.
For pairwise loss, the problem with the code I posted above is that the objective is expecting a loss vector with the same number of entries as items in the minibatch. So, with a little craftiness in the way that data composed in the minibatch, you can make your rank svm objective like this:
def rank_svm_objective( y_true, y_pred, margin=1.0): # change to 1.0? makes more sense for normalized cosine distance [-1,1]
''' This only works when y_true and y_pred are stacked in a way so that
the positive examples take up the first n/2 rows, and the corresponding negative samples
take up the last n/2 rows.
y_true corresponds to scores (e.g., inner products)
y_pred corresponds is a vector of ones or zeros (denoting positive or negative sample)
'''
n = y_true.shape[0]//2
signed = y_pred * y_true # make y_true part of the computational graph
pos = signed[:n]
neg = signed[n:]
# negative samples are multiplied by -1, so that the sign in the rankSVM objective is flipped
hinge_loss = K.relu( margin - pos - neg )
loss_vec = K.concatenate([hinge_loss, hinge_loss], axis=0)
return loss_vec
I think keras function API( http://keras.io/models/model/) can solve the rank loss perfectly. keras 0.3.2 has not the wonderful function api. 1.0++ later has.
I re-implemented the ranking loss using Keras Graph class.
See https://github.com/Kyung-Min/triplet_recommendations_keras
I hope the codes work for you
Some of the suggestions in this thread are not applicable with the latest keras. I found a good solution combining stackoverflow and triplet loss.
Essentially, we have to use the lambda method as described in the stackoverflow post, with the triplet loss defined by @maciejkula.
Most helpful comment
For pairwise loss, the problem with the code I posted above is that the objective is expecting a loss vector with the same number of entries as items in the minibatch. So, with a little craftiness in the way that data composed in the minibatch, you can make your rank svm objective like this: