nlp = spacy.load('en_core_web_md')
nlp('like this').similarity(nlp('hate this')) => 0.866.
I would like it if only similar words and synonyms have high scores. I am assuming that a high score like 0.866 comes because words like and hate occur in similar contexts. Is my reasoning correct ? Is there anything else that could be done about this ?
For those who are interested, I was able to identify antonyms reasonably well with https://github.com/facebookresearch/InferSent. It would be awesome to have a sentence embedding model and to find antonyms in spacy. I will explore sending a PR regarding these.
Yes, behavior just as you described. Antonyms appear in same context often, you can even literally replace one word with another. For instance:
I hate flowers.
I love flowers
She had a very beautiful sister.
She had a very ugly sister.
It depends on the task indeed. Many semantic similarity tasks and for our brains these sentences are semantically not similar at all.
I have a quick hack: If sentence similarity score is high, you can put a second stage of evaluation with Wordnet and look for antonym verbs , adverbs and adjectives. Wordnet is a word taxonomy based on semantics.
https://wordnet.princeton.edu/
https://stackoverflow.com/questions/24192979/how-to-generate-a-list-of-antonyms-for-adjectives-in-wordnet-using-python
http://www.nltk.org/howto/wordnet.html
Surely this approach cannot cover everything but it works well for common verbs, adjectives and adverbs.
You can consider is as a workarounf until come up with a statistical strategy.
Cheers,
Duygu.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Yes, behavior just as you described. Antonyms appear in same context often, you can even literally replace one word with another. For instance:
It depends on the task indeed. Many semantic similarity tasks and for our brains these sentences are semantically not similar at all.
I have a quick hack: If sentence similarity score is high, you can put a second stage of evaluation with Wordnet and look for antonym verbs , adverbs and adjectives. Wordnet is a word taxonomy based on semantics.
https://wordnet.princeton.edu/
https://stackoverflow.com/questions/24192979/how-to-generate-a-list-of-antonyms-for-adjectives-in-wordnet-using-python
http://www.nltk.org/howto/wordnet.html
Surely this approach cannot cover everything but it works well for common verbs, adjectives and adverbs.
You can consider is as a workarounf until come up with a statistical strategy.
Cheers,
Duygu.