Hi all,
Based on this paper, do you think it worths the effort to implement relative cosine similarity measure?
https://ufal.mff.cuni.cz/pbml/105/art-leeuwenberg-et-al.pdf
Note that I'm not suggesting this as a potential contributor but as a grateful user.
Thank you,
Viktor
Thanks for request @viplexke, I quickly looked at the article: IMO doesn't look very useful for including it to gensim. Also, the formula for relative cosine similarity looks pretty simple (i.e. any person who need this can implement it self).
CC: @gojomo @piskvorky
It's interesting that it seems to help highlight synonyms, as opposed to other kinds of related words. If it could be done as a single short method in KeyedVectors, I think it'd be a good contribution – even though it's not hard for other to implement, sometimes people only discover new techniques by browing APIs. (Any implementation should cite this origin paper.)
@gojomo @menshikh-iv How exactly would I go about implementing this "as a single short method in KeyedVectors" (as you say @gojomo ). Excuse the question - I'm very new to Gensim. Thanks!
Section 3.5 of the referenced paper introduced the "relative cosine similarity" measure, which is essentially a measure of how-much-more-similar to word-A that word-B is, compared to the top-N-other-most-similar-to-word-A words. Essentially, it seems they observed that when one word was much "more similar" to a target word than the next N, it was especially likely to be a true synonym. (This "better than the others" was more reliable than any absolute cutoff of cosine-similarity; see the paper for the details and full reasoning.)
So given two words and a top-n value, and a set of word-vectors, this new measure can be calculated. That naturally suggests a single new method on KeyedVectors with a signature like:
def relative_cosine_similarity(word_a, word_b, topn=10):
...
A pull-request that implements this method, matching the definition in the paper, with an explanatory doc-comment (with link to the paper) and some tests (which manage to somewhat confirm expected behavior along the lines of that described in the paper) would be a useful contribution.
@gojomo Thanks so much! I got it working now.
Hi dear @ailsamm I need to implement the same measure, can you provide me the code please since it works with you
I really need it :(
@rawannasser Sure! How should I send it?
@ailsamm
Thanks so much!
I don't know if I can write my email here but maybe you can upload it in your GitHub?
@ailsamm Can you submit your implementation as a pull-request for potential integration to the project?
Hi,
Is anyone working on this?
If not, I would like to take this up.
@jenishah go ahead please :)
If anyone is not working on this can I contribute?
feel free to contribute @rsdel2007
@rsdel2007 yes, please
Most helpful comment
Section 3.5 of the referenced paper introduced the "relative cosine similarity" measure, which is essentially a measure of how-much-more-similar to word-A that word-B is, compared to the top-N-other-most-similar-to-word-A words. Essentially, it seems they observed that when one word was much "more similar" to a target word than the next N, it was especially likely to be a true synonym. (This "better than the others" was more reliable than any absolute cutoff of cosine-similarity; see the paper for the details and full reasoning.)
So given two words and a top-n value, and a set of word-vectors, this new measure can be calculated. That naturally suggests a single new method on
KeyedVectorswith a signature like:A pull-request that implements this method, matching the definition in the paper, with an explanatory doc-comment (with link to the paper) and some tests (which manage to somewhat confirm expected behavior along the lines of that described in the paper) would be a useful contribution.