Keras: Implementation of margin-based ranking loss

Created on 29 Oct 2015 · 12Comments · Source: keras-team/keras

What's the best way to implement a margin-based ranking loss like the one described in [1] in keras? So far, I have used either the dot operation of the Merge layer or the siamese architecture described in #242 to calculate the similarity between two inputs. I am unsure how to extend these (or use another approach) to take into consider a corrupted pair of inputs. Any help is greatly appreciated.

[1] Bordes, A., Usunier, N., Weston, J., & Yakhnenko, O. (2013). Translating Embeddings for Modeling Multi-Relational Data. Advances in NIPS, 26, 2787–2795.

stale

Source

sebastianruder

👍2

Most helpful comment

For pairwise loss, the problem with the code I posted above is that the objective is expecting a loss vector with the same number of entries as items in the minibatch. So, with a little craftiness in the way that data composed in the minibatch, you can make your rank svm objective like this:

def rank_svm_objective( y_true, y_pred, margin=1.0): # change to 1.0?  makes more sense for normalized cosine distance [-1,1]
    ''' This only works when y_true and y_pred are stacked in a way so that
    the positive examples take up the first n/2 rows, and the corresponding negative samples
    take up the last n/2 rows.

    y_true corresponds to scores (e.g., inner products)
    y_pred corresponds is a vector of ones or zeros (denoting positive or negative sample)
    '''
    n = y_true.shape[0]//2
    signed = y_pred * y_true # make y_true part of the computational graph
    pos = signed[:n]
    neg = signed[n:]
    # negative samples are multiplied by -1, so that the sign in the rankSVM objective is flipped
    hinge_loss = K.relu( margin - pos - neg )
    loss_vec = K.concatenate([hinge_loss, hinge_loss], axis=0) 
    return loss_vec

drhyrum on 12 May 2016

👍6 ❤1

All 12 comments

Hi @sebastianruder There are no diagrams in that paper and I hate reading raw text. Could you please elaborate about the required model? Like whats your input and output? What do you mean by corrupted pair?

farizrahman4u on 1 Nov 2015

My inputs are basically two pairs of documents (in the paper, these are two pairs of triples). One is a pair of similar documents _a_ and _b_, the other one is a pair of dissimilar documents _a_ and _c_. I want to minimize the dissimilarity between _a_ and _b_ while maximizing the dissimilarity between _a_ and _c_ with a margin (see equation 1 on page 3 of the paper). Is there a way to have a loss function that can take these pairs (or their similarity/dissimilarity scores as input), rather than just the predicted and gold labels?

sebastianruder on 2 Nov 2015

In other words, what's the best way to minimize any kind of ranking criterion using keras (e.g. in [1], 4.2)?

[1] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 12(Aug), 2493–2537. Retrieved from http://arxiv.org/abs/1103.0398

sebastianruder on 10 Nov 2015

The simplest way to build a reranker in keras is basically to classify as 1 the positive examples and as 0 the corrupted. In practice you train a classifier and you use its prediction to rerank. It is not the best option at all but in many case it works really well ;) You can try from there, i thought to an implementation of a pairwise reranker but i not reached any solution without disrupting keras.

dbonadiman on 10 Nov 2015

Thanks for the tip! Yes, that's basically what I've been doing as well. Thought that there was a way to more closely replicate the objective given in the paper. I guess I'll stick with my current objective then.

sebastianruder on 10 Nov 2015

Is there any progress or any sample code to realize the ranking loss now?
I find it's very common in Multi-NLP tasks. Many papers use the loss to optimize the results. But I can't practice with keras.
Thanks.

liyi193328 on 4 Feb 2016

I've been attempting to accomplish this using a custom loss function. Essentially, after a

Merge( [doc_input_a, doc_input_b], mode='dot, dot_axis=-1)
I then compile with something like
model.compile( loss=rank_svm_objective, optimizer='adam', class_mode='binary' )
where rank_svm_objective is defined as

def rank_svm_objective( y_true, y_pred, margin=1.0):
    # y_pred are the dot product similarities, in interleaved form (positive example, negative example, ...)
    # y_true is simply +1, -1, +1, -1
    signed = y_pred * y_true # we do this, just so that y_true is part of the computational graph
    pos = signed[0::2]
    neg = signed[1::2]
    # negative samples are multiplied by -1, so that the sign in the rankSVM objective is flipped below
    rank_hinge_loss = K.mean( K.relu( margin - pos - neg ), axis=-1)
    return rank_hinge_loss

where y_pred is the output of the Merged dot product, and I've fed the data in the minibatch in an interleaved ordering of positive, negative, etc., examples. Furthermore, y_true, is just +1, -1, etc.

Unfortunately, this doesn't seem to work. There's maybe something simple that I'm missing here, but I've gotten similar tricks (defining the objective over interleaved elements of the minibatch) to work with tensorflow.

drhyrum on 8 Apr 2016

🎉3

@sebastianruder I think the code from https://github.com/maciejkula/triplet_recommendations_keras is really a good example.

eshijia on 12 May 2016

def rank_svm_objective( y_true, y_pred, margin=1.0): # change to 1.0?  makes more sense for normalized cosine distance [-1,1]
    ''' This only works when y_true and y_pred are stacked in a way so that
    the positive examples take up the first n/2 rows, and the corresponding negative samples
    take up the last n/2 rows.

    y_true corresponds to scores (e.g., inner products)
    y_pred corresponds is a vector of ones or zeros (denoting positive or negative sample)
    '''
    n = y_true.shape[0]//2
    signed = y_pred * y_true # make y_true part of the computational graph
    pos = signed[:n]
    neg = signed[n:]
    # negative samples are multiplied by -1, so that the sign in the rankSVM objective is flipped
    hinge_loss = K.relu( margin - pos - neg )
    loss_vec = K.concatenate([hinge_loss, hinge_loss], axis=0) 
    return loss_vec

drhyrum on 12 May 2016

👍6 ❤1

I think keras function API( http://keras.io/models/model/) can solve the rank loss perfectly. keras 0.3.2 has not the wonderful function api. 1.0++ later has.

liyi193328 on 23 May 2016

I re-implemented the ranking loss using Keras Graph class.
See https://github.com/Kyung-Min/triplet_recommendations_keras
I hope the codes work for you