Spacy: use of NER after update fails

Created on 13 Mar 2016 · 5Comments · Source: explosion/spaCy

I'm trying to to train NER at it's succeed, with following code:

code for NER update is taken from https://github.com/spacy-io/spaCy/issues/187

import plac

from spacy.en import English
from spacy.gold import GoldParse
import os

fname = 'mmmm'

nlp = English(parser=False) # Avoid loading the parser, for quick load times

doc = nlp.tokenizer(u'Lions and tigers and grizzly bears!')
nlp.tagger(doc)

nlp.entity.add_label('ANIMAL') # <-- New in v0.100

indices = tuple(range(len(doc)))
words = [w.text for w in doc]
tags = [w.tag_ for w in doc]
heads = [0 for _ in doc]
deps = ['' for _ in doc]

ner = ['U-ANIMAL', 'O', 'U-ANIMAL', 'O', 'B-ANIMAL', 'L-ANIMAL', 'O']

annot = GoldParse(doc, (indices, words, tags, heads, deps, ner))

loss = nlp.entity.train(doc, annot)
i = 0
while loss != 0 and i < 1000:
loss = nlp.entity.train(doc, annot)
i += 1
print("Used %d iterations" % i)

nlp.entity(doc)
for ent in doc.ents:
print(ent.text, ent.label_)
nlp.entity.model.dump(os.getcwd())

Than I load the saved model it also succeed. Than after loading I'm trying to apply the model on the sentence -> this part is failed

here is the code:

from spacy.en import English
import os

path = os.getcwd() + '/dic/mmmm' # path to the model
path = path.decode('utf-8')

nlp = English(parser=False)
nlp.entity.model.load(path)

doc = nlp(u'Lions and tigers and grizzly bears!')
ents = list(doc.ents)

print ents

bug

Source

michael135

👍1

Most helpful comment

Hey,

Sorry for the delay getting to this.

It seems that spaCy isn't saving the labelling properly. Re-dding the label before loading is the correct workaround for now, until we fix this.

syllog1sm on 21 Mar 2016

👍3

All 5 comments

michael135, did you manage to get this solved? I'm having the same difficulties and would love to know if it's a bug or if I'm missing something somewhere.

Just as a test, I dumped out the entity model without making any changes to it and then trained my new entities and dumped out the new entity model. I then used cmp to verify that the two dumps were different and they in fact were. So it appears as though my newly trained entities are in fact getting dumped but don't seem to be available after loading.

ryangrimm on 21 Mar 2016

👍1

michael135, I dug into the source code a bit and may have figured out a solution to our problem. This may not be a correct solution as I'm far from experienced with spacy, but it seems as though you have to tell spacy about your custom entity labels along with loading the model you dumped. So before you call:

nlp.entity.model.load(path)

call:

nlp.entity.add_label('ANIMAL')

This does the trick for me but again, I have no idea if it's the correct solution, hannibal will probably have to weigh in on that one.

ryangrimm on 21 Mar 2016

👍2

Hey,

Sorry for the delay getting to this.

It seems that spaCy isn't saving the labelling properly. Re-dding the label before loading is the correct workaround for now, until we fix this.

syllog1sm on 21 Mar 2016

👍3

That means, that the label should be added twice before saving pickle and after loading it?