Spacy: TokenVectorEncoder object is not iterable when running example in 2.0 alpha

Created on 21 Oct 2017 · 9Comments · Source: explosion/spaCy

Im trying to run one of the examples in 2.0.0 alpha, for extending a pre existing model with
custom ner tags avaliable here [1],
here is the error i get:

$ python train_new_entity_type.py  en  othersame 
Creating initial model en
Traceback (most recent call last):
  File "train_new_entity_type.py", line 124, in <module>
    plac.call(main)
  File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "train_new_entity_type.py", line 106, in main
    train_ner(nlp, train_data, output_directory)
  File "train_new_entity_type.py", line 53, in train_ner
    optimizer = nlp.begin_training(lambda: [])
  File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy/language.py", line 410, in begin_training
    for name, proc in self.pipeline:
TypeError: 'TokenVectorEncoder' object is not iterable

I expected to get this to work, as its already documented here [2],
all the models and spacy install are recent and fresh installs (21st october).

Your Environment

    Info about spaCy

    Python version     2.7.13         
    Platform           Linux-4.11.12-100.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
    spaCy version      2.0.0a17       
    Location           /home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy
    Models             en_core_web_sm, en_core_web_lg

Operating System: Fedora Linux
Python Version Used: Python 2.7.13 reproducible with 3.5.3
spaCy Version Used: 2.0.0a17
Environment Information:

[ 1] https://github.com/explosion/spaCy/blob/develop/examples/training/train_new_entity_type.py
[ 2] https://alpha.spacy.io/usage/training#example-new-entity-type

examples 🌙 nightly

Source

mikeatm

Most helpful comment

Ah, thanks! 👍 You should be able to simply use nlp.get_pipe() to get a pipeline component, e.g.:

ner = nlp.get_pipe('ner')
ner.add_label('ANIMAL')

Or, probably cleaner:

ner = NeuralEntityRecognizer(nlp.vocab)
ner.add_label('ANIMAL')
nlp.add_pipe(ner)

ines on 22 Oct 2017

👍2

All 9 comments

I think you might be using an outdated model that still has the tensorizer in the pipeline. The latest alpha version now has a handy command that lets you check that all models are compatible and up to date, and shows you which ones need to be upgraded:

spacy validate

So simply downloading the latest en_core_web_sm or en_core_web_lg model should hopefully fix this.

ines on 21 Oct 2017

I hope this is the case, but here is the output of validate:

$ spacy validate  

    Installed models (spaCy v2.0.0a17)
    /home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy

    TYPE        NAME                  MODEL                 VERSION                                   
    package     en-core-web-sm        en_core_web_sm        2.0.0a7  ✔      
    package     en-core-web-lg        en_core_web_lg        2.0.0a1  ✔      
    link        en_core_web_lg        en_core_web_lg        2.0.0a1  ✔      
    link        en_core_web_sm        en_core_web_sm        2.0.0a7  ✔

I have been really looking forward to adding custom ner tags, so im eager
to get it working.

mikeatm on 21 Oct 2017

Thanks for updating! I think I found the issue – try removing this line:

https://github.com/explosion/spaCy/blob/490ad3eaf070f2e210869c37b70edf3fcd504da7/examples/training/train_new_entity_type.py#L103

I think we may have forgotten to push the updated version of the example for the latest alpha release and models, sorry about that.

Edit: Since the pipeline architecture has changed and nlp.pipeline entries are now (name, func) tuples, this line also has to be adjusted:

https://github.com/explosion/spaCy/blob/490ad3eaf070f2e210869c37b70edf3fcd504da7/examples/training/train_new_entity_type.py#L104

nlp.add_pipe(NeuralEntityRecognizer(nlp.vocab))

Will test this as soon as we have time and adjust it accordingly!

ines on 22 Oct 2017

👍1

Thanks @ines
I also had to change the add_label line to:

nlp.pipeline[nlp.pipe_names.index('ner')][1].add_label('ANIMAL')

Not quite sure that's how it's supposed to be done, but it works for me.

jerbob92 on 22 Oct 2017

👍1

Ah, thanks! 👍 You should be able to simply use nlp.get_pipe() to get a pipeline component, e.g.:

ner = nlp.get_pipe('ner')
ner.add_label('ANIMAL')

Or, probably cleaner:

ner = NeuralEntityRecognizer(nlp.vocab)
ner.add_label('ANIMAL')
nlp.add_pipe(ner)

ines on 22 Oct 2017

👍2

That works too, thanks!

jerbob92 on 22 Oct 2017

Yay! Thanks for your help and feedback. If it's all working for you now, feel free to submit a PR to develop btw (otherwise, we're happy to take care of this later as well).

ines on 22 Oct 2017

👍1

I can confirm that the fix works.

mikeatm on 22 Oct 2017

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.