Im trying to run one of the examples in 2.0.0 alpha, for extending a pre existing model with
custom ner tags avaliable here [1],
here is the error i get:
$ python train_new_entity_type.py en othersame
Creating initial model en
Traceback (most recent call last):
File "train_new_entity_type.py", line 124, in <module>
plac.call(main)
File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "train_new_entity_type.py", line 106, in main
train_ner(nlp, train_data, output_directory)
File "train_new_entity_type.py", line 53, in train_ner
optimizer = nlp.begin_training(lambda: [])
File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy/language.py", line 410, in begin_training
for name, proc in self.pipeline:
TypeError: 'TokenVectorEncoder' object is not iterable
I expected to get this to work, as its already documented here [2],
all the models and spacy install are recent and fresh installs (21st october).
Info about spaCy
Python version 2.7.13
Platform Linux-4.11.12-100.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
spaCy version 2.0.0a17
Location /home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy
Models en_core_web_sm, en_core_web_lg
[ 1] https://github.com/explosion/spaCy/blob/develop/examples/training/train_new_entity_type.py
[ 2] https://alpha.spacy.io/usage/training#example-new-entity-type
I think you might be using an outdated model that still has the tensorizer in the pipeline. The latest alpha version now has a handy command that lets you check that all models are compatible and up to date, and shows you which ones need to be upgraded:
spacy validate
So simply downloading the latest en_core_web_sm or en_core_web_lg model should hopefully fix this.
I hope this is the case, but here is the output of validate:
$ spacy validate
Installed models (spaCy v2.0.0a17)
/home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy
TYPE NAME MODEL VERSION
package en-core-web-sm en_core_web_sm 2.0.0a7 โ
package en-core-web-lg en_core_web_lg 2.0.0a1 โ
link en_core_web_lg en_core_web_lg 2.0.0a1 โ
link en_core_web_sm en_core_web_sm 2.0.0a7 โ
I have been really looking forward to adding custom ner tags, so im eager
to get it working.
Thanks for updating! I think I found the issue โ try removing this line:
I think we may have forgotten to push the updated version of the example for the latest alpha release and models, sorry about that.
Edit: Since the pipeline architecture has changed and nlp.pipeline entries are now (name, func) tuples, this line also has to be adjusted:
nlp.add_pipe(NeuralEntityRecognizer(nlp.vocab))
Will test this as soon as we have time and adjust it accordingly!
Thanks @ines
I also had to change the add_label line to:
nlp.pipeline[nlp.pipe_names.index('ner')][1].add_label('ANIMAL')
Not quite sure that's how it's supposed to be done, but it works for me.
Ah, thanks! ๐ You should be able to simply use nlp.get_pipe() to get a pipeline component, e.g.:
ner = nlp.get_pipe('ner')
ner.add_label('ANIMAL')
Or, probably cleaner:
ner = NeuralEntityRecognizer(nlp.vocab)
ner.add_label('ANIMAL')
nlp.add_pipe(ner)
That works too, thanks!
Yay! Thanks for your help and feedback. If it's all working for you now, feel free to submit a PR to develop btw (otherwise, we're happy to take care of this later as well).
I can confirm that the fix works.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Ah, thanks! ๐ You should be able to simply use
nlp.get_pipe()to get a pipeline component, e.g.:Or, probably cleaner: