code for NER update is taken from https://github.com/spacy-io/spaCy/issues/187
import plac
from spacy.en import English
from spacy.gold import GoldParse
import os
fname = 'mmmm'
nlp = English(parser=False) # Avoid loading the parser, for quick load times
doc = nlp.tokenizer(u'Lions and tigers and grizzly bears!')
nlp.tagger(doc)
nlp.entity.add_label('ANIMAL') # <-- New in v0.100
indices = tuple(range(len(doc)))
words = [w.text for w in doc]
tags = [w.tag_ for w in doc]
heads = [0 for _ in doc]
deps = ['' for _ in doc]
ner = ['U-ANIMAL', 'O', 'U-ANIMAL', 'O', 'B-ANIMAL', 'L-ANIMAL', 'O']
annot = GoldParse(doc, (indices, words, tags, heads, deps, ner))
loss = nlp.entity.train(doc, annot)
i = 0
while loss != 0 and i < 1000:
loss = nlp.entity.train(doc, annot)
i += 1
print("Used %d iterations" % i)
nlp.entity(doc)
for ent in doc.ents:
print(ent.text, ent.label_)
nlp.entity.model.dump(os.getcwd())
here is the code:
from spacy.en import English
import os
path = os.getcwd() + '/dic/mmmm' # path to the model
path = path.decode('utf-8')
nlp = English(parser=False)
nlp.entity.model.load(path)
doc = nlp(u'Lions and tigers and grizzly bears!')
ents = list(doc.ents)
print ents
michael135, did you manage to get this solved? I'm having the same difficulties and would love to know if it's a bug or if I'm missing something somewhere.
Just as a test, I dumped out the entity model without making any changes to it and then trained my new entities and dumped out the new entity model. I then used cmp to verify that the two dumps were different and they in fact were. So it appears as though my newly trained entities are in fact getting dumped but don't seem to be available after loading.
michael135, I dug into the source code a bit and may have figured out a solution to our problem. This may not be a correct solution as I'm far from experienced with spacy, but it seems as though you have to tell spacy about your custom entity labels along with loading the model you dumped. So before you call:
nlp.entity.model.load(path)
call:
nlp.entity.add_label('ANIMAL')
This does the trick for me but again, I have no idea if it's the correct solution, hannibal will probably have to weigh in on that one.
Hey,
Sorry for the delay getting to this.
It seems that spaCy isn't saving the labelling properly. Re-dding the label before loading is the correct workaround for now, until we fix this.
That means, that the label should be added twice before saving pickle and after loading it?
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Hey,
Sorry for the delay getting to this.
It seems that spaCy isn't saving the labelling properly. Re-dding the label before loading is the correct workaround for now, until we fix this.