Flair: How does the ranking loss function work for similarity learning?

Created on 26 Mar 2020  路  2Comments  路  Source: flairNLP/flair

Since there is only little documentation for the similarity learning module, I was wondering how the flair.models.similarity_learning_model.RankingLoss class works. Could anyone perhaps illustrate this with a practical example or has at least a reference to a paper or something using the ranking loss as it is implemented here?

I know it takes a similarity matrix of the inputs (i.e. sentences) and a target matrix with 1.0 where two sentences are similar, right?

question

Most helpful comment

We assume that positive (corresponding) pairs have label 1, while negative (non-corresponding) pairs have label 0. Let's say the pairs are sentences, and our similarity model maps sentence pair to a pair of embeddings, and similarity measure outputs similarity score. For a batch we get a matrix of all pairwise similarities between sentence pairs in a batch, and we also supply a label matrix of all pairwise labels. The matrix is constructed in similarity measure so that all corresponding pairs are on the diagonal. In the loss function we subtract this diagonal element from all off-diagonal elements + margin. Let's denote three sentence embeddings with x_i, x_j and x_k, and let (x_i, x_j) be corresponding pair and (x_i, x_k) be non-corresponding pair. Let's denote similarity between x_i and x_j with s_ij, and similarity between x_i and x_k with s_ik. Then the loss computed at each element of loss matrix is max(0, s_ik - s_ij + m), where m is the margin, which is parameter of the loss function. This loss is sometimes called triplet ranking loss, because triplet (i,j,k) incurs a loss if similarity between non-corresponding pair + margin is bigger than similarity between corresponding pair.

All 2 comments

We assume that positive (corresponding) pairs have label 1, while negative (non-corresponding) pairs have label 0. Let's say the pairs are sentences, and our similarity model maps sentence pair to a pair of embeddings, and similarity measure outputs similarity score. For a batch we get a matrix of all pairwise similarities between sentence pairs in a batch, and we also supply a label matrix of all pairwise labels. The matrix is constructed in similarity measure so that all corresponding pairs are on the diagonal. In the loss function we subtract this diagonal element from all off-diagonal elements + margin. Let's denote three sentence embeddings with x_i, x_j and x_k, and let (x_i, x_j) be corresponding pair and (x_i, x_k) be non-corresponding pair. Let's denote similarity between x_i and x_j with s_ij, and similarity between x_i and x_k with s_ik. Then the loss computed at each element of loss matrix is max(0, s_ik - s_ij + m), where m is the margin, which is parameter of the loss function. This loss is sometimes called triplet ranking loss, because triplet (i,j,k) incurs a loss if similarity between non-corresponding pair + margin is bigger than similarity between corresponding pair.

Thank you very much for this detailed explanation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Rahulvks picture Rahulvks  路  3Comments

inyukwo1 picture inyukwo1  路  3Comments

frtacoa picture frtacoa  路  3Comments

gopalkalpande picture gopalkalpande  路  3Comments

alanakbik picture alanakbik  路  3Comments