Spacy: High similarity scores for antonyms

Created on 2 Apr 2018  路  3Comments  路  Source: explosion/spaCy

nlp = spacy.load('en_core_web_md')
nlp('like this').similarity(nlp('hate this')) => 0.866.

I would like it if only similar words and synonyms have high scores. I am assuming that a high score like 0.866 comes because words like and hate occur in similar contexts. Is my reasoning correct ? Is there anything else that could be done about this ?

My Environment

  • Platform: Darwin-15.3.0-x86_64-i386-64bit
  • Python version: 3.5.4
  • spaCy version: 2.0.5
  • Models: en_core_web_md
feat / vectors

Most helpful comment

Yes, behavior just as you described. Antonyms appear in same context often, you can even literally replace one word with another. For instance:

I hate flowers.
I love flowers

She had a very beautiful sister.
She had a very ugly sister.

It depends on the task indeed. Many semantic similarity tasks and for our brains these sentences are semantically not similar at all.

I have a quick hack: If sentence similarity score is high, you can put a second stage of evaluation with Wordnet and look for antonym verbs , adverbs and adjectives. Wordnet is a word taxonomy based on semantics.

https://wordnet.princeton.edu/
https://stackoverflow.com/questions/24192979/how-to-generate-a-list-of-antonyms-for-adjectives-in-wordnet-using-python
http://www.nltk.org/howto/wordnet.html

Surely this approach cannot cover everything but it works well for common verbs, adjectives and adverbs.
You can consider is as a workarounf until come up with a statistical strategy.
Cheers,
Duygu.

All 3 comments

For those who are interested, I was able to identify antonyms reasonably well with https://github.com/facebookresearch/InferSent. It would be awesome to have a sentence embedding model and to find antonyms in spacy. I will explore sending a PR regarding these.

Yes, behavior just as you described. Antonyms appear in same context often, you can even literally replace one word with another. For instance:

I hate flowers.
I love flowers

She had a very beautiful sister.
She had a very ugly sister.

It depends on the task indeed. Many semantic similarity tasks and for our brains these sentences are semantically not similar at all.

I have a quick hack: If sentence similarity score is high, you can put a second stage of evaluation with Wordnet and look for antonym verbs , adverbs and adjectives. Wordnet is a word taxonomy based on semantics.

https://wordnet.princeton.edu/
https://stackoverflow.com/questions/24192979/how-to-generate-a-list-of-antonyms-for-adjectives-in-wordnet-using-python
http://www.nltk.org/howto/wordnet.html

Surely this approach cannot cover everything but it works well for common verbs, adjectives and adverbs.
You can consider is as a workarounf until come up with a statistical strategy.
Cheers,
Duygu.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings