Spacy: nlp.pipe() crashes after second empty string

Created on 25 Jun 2019  路  5Comments  路  Source: explosion/spaCy

How to reproduce the behaviour

it is very easy to reproduce:

import spacy
nlp = spacy.load("de_core_news_sm")

texts = ["ich bin ein Satz", "", " ", "meow", " ", ""]

for doc in nlp.pipe(texts):
print(doc)

it works if the array contains only one empty string.

Your Environment

  • spaCy version: 2.1.4
  • Platform: Windows-10-10.0.17763-SP0
  • Python version: 3.6.4
bug feat / pipeline

Most helpful comment

I did some research on this and can say that the pipe crashes not after the "second empty string", but just if the last string in the array is empty.

So these texts work:

texts = ["ich bin ein Satz", "", "", "", " ", "meow", " ", "", "Noch ein Satz"]
texts = ["ich bin ein Satz", "", "", "", " ", "meow", " ", "", "Noch ein Satz", " "]

But this does not:

texts = ["ich bin ein Satz", "", "", "", " ", "meow", " ", "", "Noch ein Satz", ""]

All 5 comments

I did some research on this and can say that the pipe crashes not after the "second empty string", but just if the last string in the array is empty.

So these texts work:

texts = ["ich bin ein Satz", "", "", "", " ", "meow", " ", "", "Noch ein Satz"]
texts = ["ich bin ein Satz", "", "", "", " ", "meow", " ", "", "Noch ein Satz", " "]

But this does not:

texts = ["ich bin ein Satz", "", "", "", " ", "meow", " ", "", "Noch ein Satz", ""]

This also happens if the last element in a "minibatch" of nlp.pipe() is empty

I think this may be due to a quirck in thinc - but not sure.
cf https://github.com/explosion/thinc/pull/104

Added test verifying the fix, which is now live in v7.0.5 :tada:

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings