Gensim: Feature suggestion: relative cosine similarity for word2vec

Created on 7 Sep 2018 · 14Comments · Source: RaRe-Technologies/gensim

Hi all,
Based on this paper, do you think it worths the effort to implement relative cosine similarity measure?
https://ufal.mff.cuni.cz/pbml/105/art-leeuwenberg-et-al.pdf

Note that I'm not suggesting this as a potential contributor but as a grateful user.
Thank you,
Viktor

difficulty easy feature

Source

viplexke

Most helpful comment

Section 3.5 of the referenced paper introduced the "relative cosine similarity" measure, which is essentially a measure of how-much-more-similar to word-A that word-B is, compared to the top-N-other-most-similar-to-word-A words. Essentially, it seems they observed that when one word was much "more similar" to a target word than the next N, it was especially likely to be a true synonym. (This "better than the others" was more reliable than any absolute cutoff of cosine-similarity; see the paper for the details and full reasoning.)

So given two words and a top-n value, and a set of word-vectors, this new measure can be calculated. That naturally suggests a single new method on KeyedVectors with a signature like:

def relative_cosine_similarity(word_a, word_b, topn=10):
    ...

A pull-request that implements this method, matching the definition in the paper, with an explanatory doc-comment (with link to the paper) and some tests (which manage to somewhat confirm expected behavior along the lines of that described in the paper) would be a useful contribution.

gojomo on 31 Oct 2018

👍3

All 14 comments

Thanks for request @viplexke, I quickly looked at the article: IMO doesn't look very useful for including it to gensim. Also, the formula for relative cosine similarity looks pretty simple (i.e. any person who need this can implement it self).

CC: @gojomo @piskvorky

menshikh-iv on 10 Sep 2018

It's interesting that it seems to help highlight synonyms, as opposed to other kinds of related words. If it could be done as a single short method in KeyedVectors, I think it'd be a good contribution – even though it's not hard for other to implement, sometimes people only discover new techniques by browing APIs. (Any implementation should cite this origin paper.)

gojomo on 17 Sep 2018

👍2

@gojomo @menshikh-iv How exactly would I go about implementing this "as a single short method in KeyedVectors" (as you say @gojomo ). Excuse the question - I'm very new to Gensim. Thanks!

ailsamm on 31 Oct 2018

So given two words and a top-n value, and a set of word-vectors, this new measure can be calculated. That naturally suggests a single new method on KeyedVectors with a signature like:

def relative_cosine_similarity(word_a, word_b, topn=10):
    ...

gojomo on 31 Oct 2018

👍3

@gojomo Thanks so much! I got it working now.

ailsamm on 1 Nov 2018

Hi dear @ailsamm I need to implement the same measure, can you provide me the code please since it works with you
I really need it :(

rawannasser on 12 Nov 2018

@rawannasser Sure! How should I send it?

ailsamm on 13 Nov 2018

@ailsamm
Thanks so much!
I don't know if I can write my email here but maybe you can upload it in your GitHub?

rawannasser on 13 Nov 2018

@ailsamm Can you submit your implementation as a pull-request for potential integration to the project?

gojomo on 15 Nov 2018

Hi,
Is anyone working on this?
If not, I would like to take this up.

jenishah on 5 Dec 2018

@jenishah go ahead please :)

piskvorky on 5 Dec 2018

If anyone is not working on this can I contribute?

rsdel2007 on 18 Dec 2018

feel free to contribute @rsdel2007

menshikh-iv on 19 Dec 2018

👍1

@rsdel2007 yes, please

rawannasser on 19 Dec 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

error i using mallet

ahmedbhabbas · 4Comments

gensim.similarities.SparseMatrixSimilarity get segmentation-fault

dancinghui · 4Comments

numpy==1.7.1 and gensim

vlad17 · 4Comments

Puzzling deprecated warnings

johann-petrak · 3Comments

Loading models generated by other version of gensim

k0nserv · 3Comments