Spacy: Random AssertionError in thinc.neural.ops.Ops.unflatten

Created on 17 Apr 2019  Â·  10Comments  Â·  Source: explosion/spaCy

Hello,
i do not know what happen but i randomly get this error:

Traceback (most recent call last):
  File "/home/damiano/lavoro/python/parser/parser/nlp/ner/trainer/learning.py", line 65, in <module>
    losses=losses,
  File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/spacy/language.py", line 452, in update
    proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
  File "nn_parser.pyx", line 418, in spacy.syntax.nn_parser.Parser.update
  File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
  File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
  File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/thinc/api.py", line 275, in begin_update
    return layer.ops.unflatten(X, lengths, pad=pad), finish_update
  File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError

my code is very simple:

nlp = spacy.load('/home/damiano/model')

if "ner" not in nlp.pipe_names:
    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner, last=True)
else:
    ner = nlp.get_pipe("ner")

# add labels
ner.add_label('NAME')
ner.add_label('SURNAME')
ner.add_label('ADDRESS')
ner.add_label('BIRTHDATE')
ner.add_label('CITIZENSHIP')
ner.add_label('CITY')
ner.add_label('DATE')
ner.add_label('EMAIL')
ner.add_label('PROFESSION')
ner.add_label('REGION')
ner.add_label('TELEPHONE')
ner.add_label('URL')
ner.add_label('ZIPCODE')

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]

with nlp.disable_pipes(*other_pipes):  # only train NER
    # reset and initialize the weights randomly – but only if we're
    # training a new model
    nlp.begin_training()

    for itn in range(N_ITER):
        random.shuffle(TRAIN_DATA)
        losses = {}
        # batch up the examples using spaCy's minibatch
        batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))

        for batch in batches:

            texts, annotations = zip(*batch)

            nlp.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                drop=0.25,  # dropout - make it harder to memorise data
                losses=losses,
            )


it is basically the same code i found in the doc.

Your Environment

spaCy version 2.1.3
Location /home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/spacy
Platform Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
Python version 3.6.7
Models it

bug 🔮 thinc

Most helpful comment

I've got the same issue, with some zero-length sentences. I solved it by setting batch_size=1 in nlp.pipe. Any different value throws the error.

All 10 comments

Hello, I can confirm this happen when the document has 0 characters.
Maybe should we skip them without the AssertionError?

I've just run into this issue also. The strange thing is that it doesn't happen for every zero-length document. My pipeline gets through about 2k zero-length sentences found within the longer iterable of texts before throwing this error. It happens at the same location in the list of sentences, but if I shuffle the list of texts, I get the assertion error at different places.

I've experienced this error, too. I tend to agree with @damianoporta since I solved it by replacing null/NaN strings with 'nan' strings.

Anyhow, it is pretty interesting to notice that although I had null string all across the Series I streamed, the exception was throwed after having processed different null documents and while processing an arbitrary non-null document. Non sure what's going on under the hood!

I've been facing the same issue when try to training a model in Portuguese for NER. The interesting thing is that the error seems to appear in differents interations of the training script as you can see above.

Loaded model 'pt_core_news_sm'
0
Losses {'ner': 6539.10400141068}
1
Losses {'ner': 6198.809629522756}
2
Losses {'ner': 6379.065996479177}
3
Traceback (most recent call last):
File "trainingLoopV2.py", line 3084, in
plac.call(main)
File "C:\Anaconda\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "C:\Anaconda\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), *kwargs)
File "trainingLoopV2.py", line 3056, in main
losses=losses,
File "C:\Anaconda\lib\site-packages\spacy\language.py", line 452, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 418, in spacy.syntax.nn_parser.Parser.update
File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
File "C:\Anaconda\lib\site-packages\thinc\neural_classes\feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "C:\Anaconda\lib\site-packages\thinc\api.py", line 275, in begin_update
return layer.ops.unflatten(X, lengths, pad=pad), finish_update
File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError

This error did NOT appear when I was training to add a new entity type. This error just appears when I'm trying to update a model and get better accuracy with the NER.

I've got the same issue while processing the raw 20Newsgroups dataset.

I've got the same issue, with some zero-length sentences. I solved it by setting batch_size=1 in nlp.pipe. Any different value throws the error.

@Cortysus , I had similar case, batch_size=1 solved the problem for me.

This should be fixed in Thinc v7.0.5: https://github.com/explosion/thinc/releases/tag/v7.0.5 Thanks for you patience 🙂

The easiest way to reproduce the bug:

import spacy
NLP = spacy.load("en_core_web_sm")
for doc in NLP.pipe(['some text', '']):
    pass

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings