Spacy's token.is_stop seems to be outputting different results depending on the capitalization of the word. For example "the" at the start of the sentence is not recognized as a stop word if it's capitalized.
How to replicate:
print(spacy.about.__version__)
nlp = spacy.load('en')
for s in ["The store", "the store"]:
doc = nlp(s)
print('\n{}'.format(s))
for t in doc:
print('{}\t{}'.format(t.text, t.is_stop))
Output is:
2.0.5
The store
The False
store False
the store
the True
store False
I suspect that this is not intended behavior? Since words at the start of sentences are usually capitalized, but this doesn't mean that their meaning changes.
conda install spacy with conda version 4.4.7You are right, this is not intended behavior. I will fix it.
Thanks for pointing out!
See #1891!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
You are right, this is not intended behavior. I will fix it.
Thanks for pointing out!