Spacy: `is_stop` depends on capitalisation

Created on 25 Jan 2018  路  3Comments  路  Source: explosion/spaCy

Spacy's token.is_stop seems to be outputting different results depending on the capitalization of the word. For example "the" at the start of the sentence is not recognized as a stop word if it's capitalized.

How to replicate:

print(spacy.about.__version__)
nlp = spacy.load('en')
for s in ["The store", "the store"]:
    doc = nlp(s)
    print('\n{}'.format(s))
    for t in doc:
        print('{}\t{}'.format(t.text, t.is_stop))

Output is:

2.0.5

The store
The False
store   False

the store
the True
store   False

I suspect that this is not intended behavior? Since words at the start of sentences are usually capitalized, but this doesn't mean that their meaning changes.

My Environment

  • spaCy version: 2.0.5
  • Platform: Darwin-17.4.0-x86_64-i386-64bit
  • Python version: 3.6.4
  • Models: en_core_web_sm, en
  • Spacy installed via conda install spacy with conda version 4.4.7
enhancement lang / all

Most helpful comment

You are right, this is not intended behavior. I will fix it.
Thanks for pointing out!

All 3 comments

You are right, this is not intended behavior. I will fix it.
Thanks for pointing out!

See #1891!

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings