Spacy: Test if word in vocabulary of spacy

Created on 30 Dec 2016  路  5Comments  路  Source: explosion/spaCy

Great job on Spacy, fantastic dependency parser!

Question: is there a way to test if words are in the (english) vocabulary?

Most helpful comment

It's actually really obvious:

s in nlp.vocab

All 5 comments

It's actually really obvious:

s in nlp.vocab

Not working

nlp = spacy.load('en')

doc = nlp('I am sflmgmavknsaccasas')

for token in doc:
    print(token in nlp.vocab)

Error:

TypeError: an integer is required

Also, is_oov is broken:

for token in doc:
    print(token.is_oov)
True
True

Same issue, attempting to use the method to find only real words in scraped text. The in nlp.vocab approach throws an error and all real words tested are True for is_oov

doc = nlp('I am sflmgmavknsaccasas dog cat bird bulbasaur')
[tok for tok in doc if tok in nlp.vocab]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "vocab.pyx", line 194, in spacy.vocab.Vocab.__contains__
TypeError: an integer is required



md5-198bd240e27df4cb7b2136032b82f217



[tok.is_oov for tok in doc]
[True, True, True, True, True, True, True]
  • spaCy version: 2.0.9
  • Platform: osx 10.13.4
  • Python version: 3.6.4
  • Models: en

@ghonk here is a workaround:
for token in 'k8s sdjhsd horse hit'.split(' '): print(nlp.vocab.has_vector(token)) but it makes sense only if you use a corpus with vectors

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings