Spacy: TokenVectorEncoder object is not iterable when running example in 2.0 alpha

Created on 21 Oct 2017  ยท  9Comments  ยท  Source: explosion/spaCy

Im trying to run one of the examples in 2.0.0 alpha, for extending a pre existing model with
custom ner tags avaliable here [1],
here is the error i get:

$ python train_new_entity_type.py  en  othersame 
Creating initial model en
Traceback (most recent call last):
  File "train_new_entity_type.py", line 124, in <module>
    plac.call(main)
  File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "train_new_entity_type.py", line 106, in main
    train_ner(nlp, train_data, output_directory)
  File "train_new_entity_type.py", line 53, in train_ner
    optimizer = nlp.begin_training(lambda: [])
  File "/home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy/language.py", line 410, in begin_training
    for name, proc in self.pipeline:
TypeError: 'TokenVectorEncoder' object is not iterable

I expected to get this to work, as its already documented here [2],
all the models and spacy install are recent and fresh installs (21st october).

Your Environment

    Info about spaCy

    Python version     2.7.13         
    Platform           Linux-4.11.12-100.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
    spaCy version      2.0.0a17       
    Location           /home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy
    Models             en_core_web_sm, en_core_web_lg
  • Operating System: Fedora Linux
  • Python Version Used: Python 2.7.13 reproducible with 3.5.3
  • spaCy Version Used: 2.0.0a17
  • Environment Information:

[ 1] https://github.com/explosion/spaCy/blob/develop/examples/training/train_new_entity_type.py
[ 2] https://alpha.spacy.io/usage/training#example-new-entity-type

examples ๐ŸŒ™ nightly

Most helpful comment

Ah, thanks! ๐Ÿ‘ You should be able to simply use nlp.get_pipe() to get a pipeline component, e.g.:

ner = nlp.get_pipe('ner')
ner.add_label('ANIMAL')

Or, probably cleaner:

ner = NeuralEntityRecognizer(nlp.vocab)
ner.add_label('ANIMAL')
nlp.add_pipe(ner)

All 9 comments

I think you might be using an outdated model that still has the tensorizer in the pipeline. The latest alpha version now has a handy command that lets you check that all models are compatible and up to date, and shows you which ones need to be upgraded:

spacy validate

So simply downloading the latest en_core_web_sm or en_core_web_lg model should hopefully fix this.

I hope this is the case, but here is the output of validate:

$ spacy validate  

    Installed models (spaCy v2.0.0a17)
    /home/data/experim/spc/sp2env/lib/python2.7/site-packages/spacy

    TYPE        NAME                  MODEL                 VERSION                                   
    package     en-core-web-sm        en_core_web_sm        2.0.0a7  โœ”      
    package     en-core-web-lg        en_core_web_lg        2.0.0a1  โœ”      
    link        en_core_web_lg        en_core_web_lg        2.0.0a1  โœ”      
    link        en_core_web_sm        en_core_web_sm        2.0.0a7  โœ” 

I have been really looking forward to adding custom ner tags, so im eager
to get it working.

Thanks for updating! I think I found the issue โ€“ try removing this line:

https://github.com/explosion/spaCy/blob/490ad3eaf070f2e210869c37b70edf3fcd504da7/examples/training/train_new_entity_type.py#L103

I think we may have forgotten to push the updated version of the example for the latest alpha release and models, sorry about that.

Edit: Since the pipeline architecture has changed and nlp.pipeline entries are now (name, func) tuples, this line also has to be adjusted:

https://github.com/explosion/spaCy/blob/490ad3eaf070f2e210869c37b70edf3fcd504da7/examples/training/train_new_entity_type.py#L104

nlp.add_pipe(NeuralEntityRecognizer(nlp.vocab))

Will test this as soon as we have time and adjust it accordingly!

Thanks @ines
I also had to change the add_label line to:

nlp.pipeline[nlp.pipe_names.index('ner')][1].add_label('ANIMAL')

Not quite sure that's how it's supposed to be done, but it works for me.

Ah, thanks! ๐Ÿ‘ You should be able to simply use nlp.get_pipe() to get a pipeline component, e.g.:

ner = nlp.get_pipe('ner')
ner.add_label('ANIMAL')

Or, probably cleaner:

ner = NeuralEntityRecognizer(nlp.vocab)
ner.add_label('ANIMAL')
nlp.add_pipe(ner)

That works too, thanks!

Yay! Thanks for your help and feedback. If it's all working for you now, feel free to submit a PR to develop btw (otherwise, we're happy to take care of this later as well).

I can confirm that the fix works.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

peterroelants picture peterroelants  ยท  3Comments

ahalterman picture ahalterman  ยท  3Comments

cverluise picture cverluise  ยท  3Comments

TropComplique picture TropComplique  ยท  3Comments

smartinsightsfromdata picture smartinsightsfromdata  ยท  3Comments