Spacy: most_similar() not working

Created on 15 Mar 2019  路  6Comments  路  Source: explosion/spaCy

How to reproduce the behaviour

import spacy
nlp = spacy.load('en_core_web_lg')
nlp.vocab.vectors.most_similar(nlp('cats').vector.reshape(1,300))

returns following
(array([], dtype=uint64), array([0], dtype=int32), array([nan], dtype=float32))

While computing the results i get the following warning
/anaconda3/envs/deep/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide """Entry point for launching an IPython kernel. /anaconda3/envs/deep/lib/python3.6/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide """Entry point for launching an IPython kernel. /anaconda3/envs/deep/lib/python3.6/site-packages/numpy/core/_methods.py:28: RuntimeWarning: invalid value encountered in reduce return umr_maximum(a, axis, None, out, keepdims, initial)

Info about spaCy and Enviroment

  • spaCy version: 2.1.0a13
  • Platform: Darwin-18.2.0-x86_64-i386-64bit
  • Python version: 3.6.8
  • Models: en_core_web_lg, xx_ent_wiki_sm, xx
  • Environment Information: Anaconda
bug feat / vectors

Most helpful comment

The problem seems to be with the nlp.vocab.vectors.data. The first row is all zeros, so normalizing the vectors for cosine distance does a divide by zero.

In [1]: import spacy                                                                                                

In [2]: nlp = spacy.load('en_core_web_lg')                                                                          

In [3]: nlp.vocab.vectors.data                                                                                      
Out[3]: 
array([[ 0.       ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.       ],
       [ 0.012001 ,  0.20751  , -0.12578  , ...,  0.13871  , -0.36049  ,
        -0.035    ],
       [-0.082752 ,  0.67204  , -0.14987  , ..., -0.1918   , -0.37846  ,
        -0.06589  ],
       ...,
       [ 0.42247  , -0.28522  , -0.38661  , ...,  0.27521  ,  0.23623  ,
        -0.72113  ],
       [ 0.47918  , -0.32734  , -0.23593  , ..., -0.19494  , -0.065226 ,
        -0.36282  ],
       [-0.63354  , -0.1503   , -0.36161  , ...,  0.26216  , -0.12094  ,
         0.0038262]], dtype=float32)

I think it might be the OOV vector?

All 6 comments

Hi @ines, any idea what I should do fix it?. Is there any version you recommend I roll back to in which this feature works?

@emiguevara but i am not using a GPU, its a plain MacBook

@ines any kind of response would be appreciated

The problem seems to be with the nlp.vocab.vectors.data. The first row is all zeros, so normalizing the vectors for cosine distance does a divide by zero.

In [1]: import spacy                                                                                                

In [2]: nlp = spacy.load('en_core_web_lg')                                                                          

In [3]: nlp.vocab.vectors.data                                                                                      
Out[3]: 
array([[ 0.       ,  0.       ,  0.       , ...,  0.       ,  0.       ,
         0.       ],
       [ 0.012001 ,  0.20751  , -0.12578  , ...,  0.13871  , -0.36049  ,
        -0.035    ],
       [-0.082752 ,  0.67204  , -0.14987  , ..., -0.1918   , -0.37846  ,
        -0.06589  ],
       ...,
       [ 0.42247  , -0.28522  , -0.38661  , ...,  0.27521  ,  0.23623  ,
        -0.72113  ],
       [ 0.47918  , -0.32734  , -0.23593  , ..., -0.19494  , -0.065226 ,
        -0.36282  ],
       [-0.63354  , -0.1503   , -0.36161  , ...,  0.26216  , -0.12094  ,
         0.0038262]], dtype=float32)

I think it might be the OOV vector?

Just a side note:
It works fine with en_core_web_md model

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings