Hello,
i do not know what happen but i randomly get this error:
Traceback (most recent call last):
File "/home/damiano/lavoro/python/parser/parser/nlp/ner/trainer/learning.py", line 65, in <module>
losses=losses,
File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/spacy/language.py", line 452, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 418, in spacy.syntax.nn_parser.Parser.update
File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "/home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/thinc/api.py", line 275, in begin_update
return layer.ops.unflatten(X, lengths, pad=pad), finish_update
File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError
my code is very simple:
nlp = spacy.load('/home/damiano/model')
if "ner" not in nlp.pipe_names:
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last=True)
else:
ner = nlp.get_pipe("ner")
# add labels
ner.add_label('NAME')
ner.add_label('SURNAME')
ner.add_label('ADDRESS')
ner.add_label('BIRTHDATE')
ner.add_label('CITIZENSHIP')
ner.add_label('CITY')
ner.add_label('DATE')
ner.add_label('EMAIL')
ner.add_label('PROFESSION')
ner.add_label('REGION')
ner.add_label('TELEPHONE')
ner.add_label('URL')
ner.add_label('ZIPCODE')
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes): # only train NER
# reset and initialize the weights randomly – but only if we're
# training a new model
nlp.begin_training()
for itn in range(N_ITER):
random.shuffle(TRAIN_DATA)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
drop=0.25, # dropout - make it harder to memorise data
losses=losses,
)
it is basically the same code i found in the doc.
spaCy version 2.1.3
Location /home/damiano/lavoro/python/virtualenvs/parser/lib/python3.6/site-packages/spacy
Platform Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic
Python version 3.6.7
Models it
Hello, I can confirm this happen when the document has 0 characters.
Maybe should we skip them without the AssertionError?
I've just run into this issue also. The strange thing is that it doesn't happen for every zero-length document. My pipeline gets through about 2k zero-length sentences found within the longer iterable of texts before throwing this error. It happens at the same location in the list of sentences, but if I shuffle the list of texts, I get the assertion error at different places.
I've experienced this error, too. I tend to agree with @damianoporta since I solved it by replacing null/NaN strings with 'nan' strings.
Anyhow, it is pretty interesting to notice that although I had null string all across the Series I streamed, the exception was throwed after having processed different null documents and while processing an arbitrary non-null document. Non sure what's going on under the hood!
I've been facing the same issue when try to training a model in Portuguese for NER. The interesting thing is that the error seems to appear in differents interations of the training script as you can see above.
Loaded model 'pt_core_news_sm'
0
Losses {'ner': 6539.10400141068}
1
Losses {'ner': 6198.809629522756}
2
Losses {'ner': 6379.065996479177}
3
Traceback (most recent call last):
File "trainingLoopV2.py", line 3084, in
plac.call(main)
File "C:\Anaconda\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "C:\Anaconda\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func((args + varargs + extraopts), *kwargs)
File "trainingLoopV2.py", line 3056, in main
losses=losses,
File "C:\Anaconda\lib\site-packages\spacy\language.py", line 452, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 418, in spacy.syntax.nn_parser.Parser.update
File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
File "C:\Anaconda\lib\site-packages\thinc\neural_classes\feed_forward.py", line 46, in begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "C:\Anaconda\lib\site-packages\thinc\api.py", line 275, in begin_update
return layer.ops.unflatten(X, lengths, pad=pad), finish_update
File "ops.pyx", line 138, in thinc.neural.ops.Ops.unflatten
AssertionError
This error did NOT appear when I was training to add a new entity type. This error just appears when I'm trying to update a model and get better accuracy with the NER.
I've got the same issue while processing the raw 20Newsgroups dataset.
I've got the same issue, with some zero-length sentences. I solved it by setting batch_size=1 in nlp.pipe. Any different value throws the error.
@Cortysus , I had similar case, batch_size=1 solved the problem for me.
This should be fixed in Thinc v7.0.5: https://github.com/explosion/thinc/releases/tag/v7.0.5 Thanks for you patience 🙂
The easiest way to reproduce the bug:
import spacy
NLP = spacy.load("en_core_web_sm")
for doc in NLP.pipe(['some text', '']):
pass
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
I've got the same issue, with some zero-length sentences. I solved it by setting batch_size=1 in nlp.pipe. Any different value throws the error.