Hi,
I want to find all existing NER Label in a model in Spacy.
Can anyone tell, how to find that.
Thank you
You can find it in the docs.
from spacy.en import English
nlp = English()
tokens = nlp(u'Mr. Best flew to New York on Saturday morning.')
ents = list(tokens.ents)
Hi anasamoudi,
Thanks for reply.
But what u told it will give you the list of entities that document contains. But i wanted to ask, all the Existing or trained NER label.
For anyone still looking for this list, the english ones are listed here.
If you want to view it for your current model, they appear to be stored in the model's entity.cfg attribute. Namely:
>>> nlp = spacy.load('en')
>>> nlp.entity.cfg[u'actions']
{u'1': [u'CARDINAL', u'DATE', u'EVENT', u'FAC', u'GPE', u'LANGUAGE', u'LAW', u'LOC', u'MONEY', u'NORP', u'ORDINAL', u'ORG', u'PERCENT', u'PERSON', u'PRODUCT', u'QUANTITY', u'TIME', u'WORK_OF_ART'], u'0': [u''], u'3': [u'CARDINAL', u'DATE', u'EVENT', u'FAC', u'GPE', u'LANGUAGE', u'LAW', u'LOC', u'MONEY', u'NORP', u'ORDINAL', u'ORG', u'PERCENT', u'PERSON', u'PRODUCT', u'QUANTITY', u'TIME', u'WORK_OF_ART'], u'2': [u'CARDINAL', u'DATE', u'EVENT', u'FAC', u'GPE', u'LANGUAGE', u'LAW', u'LOC', u'MONEY', u'NORP', u'ORDINAL', u'ORG', u'PERCENT', u'PERSON', u'PRODUCT', u'QUANTITY', u'TIME', u'WORK_OF_ART'], u'5': [u''], u'4': [u'CARDINAL', u'DATE', u'EVENT', u'FAC', u'GPE', u'LANGUAGE', u'LAW', u'LOC', u'MONEY', u'NORP', u'ORDINAL', u'ORG', u'PERCENT', u'PERSON', u'PRODUCT', u'QUANTITY', u'TIME', u'WORK_OF_ART']}
If you add a new label, it is stored under entity.cfg['extra_labels']
@lgenerknol thank you, I was digging through the cython source trying to find this!
I'm experimenting with how the data is stored in this attribute, because I want to write a training routing which checks for entities which exist already in the model, and adds them if they do not exist.
I've found
>>>nlp.entity.cfg[u'actions'][u'1'] == nlp.entity.cfg[u'actions'][u'2']
True
>>> nlp.entity.add_label('TEST')
>>> nlp.entity.cfg['extra_labels']
['TEST', 'TEST', 'TEST', 'TEST', 'TEST']
#add again to see what happens
>>> nlp.entity.add_label('TEST')
>>> nlp.entity.cfg['extra_labels']
['TEST', 'TEST', 'TEST', 'TEST', 'TEST']
#add an in-built type to see what happens
>>> nlp.entity.add_label('CARDINAL')
>>> nlp.entity.cfg['extra_labels']
['TEST', 'TEST', 'TEST', 'TEST', 'TEST', 'CARDINAL']
A few questions
nlp.entity.cfg[u'actions']? They appear to hold identical in-built types, and my guess is that this is to make each label a valid 'state' to correspond to each action in the parser, so they will be identical while the parser is in it's initial state?nlp.entity.cfg[u'actions'] holds identical labels, and I can lazily check if my entity is not in nlp.entity.cfg[u'actions'][u'1'] before adding it?nlp.entity.cfg['extra_labels']. Is it safe then to add a label which I have already added?nlp.entity.cfg['extra_labels'], rather than creating one for each element in nlp.entity.cfg['actions']. Obviously, this is something you shouldn't do anyway, but I wonder what the consequence is?I am trying the followinf code:
import spacy
nlp = spacy.load('en')
tokens = nlp(u'Mr. Best flew to New York on Saturday morning.')
ents = list(tokens.ents)
print 'ents:', ents
e = nlp.entity.cfg[u'actions']
print 'all entitiy cfg info:', e
This gave me error:
ents: [Best, New York, Saturday, morning]
Traceback (most recent call last):
File "spacy-103.py", line 10, in
e = nlp.entity.cfg[u'actions']
KeyError: u'actions'
Has something changed?
@honnibal @ines : can you help here.
Now what we see from nlp.entity.cfg is a dict without the actions key.
`nlp.entity.cfg
{u'beam_density': 0.0,
u'beam_width': 1,
u'cnn_maxout_pieces': 3,
u'hidden_depth': 1,
u'hidden_width': 200,
u'hist_size': 0,
u'hist_width': 0,
u'maxout_pieces': 2,
u'nr_class': 73,
u'pretrained_dims': 300,
u'token_vector_width': 128}
`
One way to get an idea around this (NOT the best way though) is to look into moves file.
spacy/data/en_core_web_md/en_core_web_md-2.0.0/ner$ vi moves
This gives an idea around:
"NORP", "DATE", "CARDINAL", "GPE", "PERCENT", "ORG", "EVENT", "MONEY" andso on...
Another way is to understand the nature of the data on which it was trained on.
https://spacy.io/models/en#en_core_web_md
and finding the source on which it was trained. Eg - ONTONOTES 5
https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf
2.6 Entity Names Annotation
Names (often referred to as “Named Entities”) are annotated according to the following
set of types:
PERSON People, including fictional
NORP Nationalities or religious or political groups
FACILITY Buildings, airports, highways, bridges, etc.
ORGANIZATION Companies, agencies, institutions, etc.
GPE Countries, cities, states
LOCATION Non-GPE locations, mountain ranges, bodies of water
PRODUCT Vehicles, weapons, foods, etc. (Not services)
EVENT Named hurricanes, battles, wars, sports events, etc.
WORK OF ART Titles of books, songs, etc.
LAW Named documents made into laws
OntoNotes Release 5.0
22
LANGUAGE Any named language
The following values are also annotated in a style similar to names:
DATE Absolute or relative dates or periods
TIME Times smaller than a day
PERCENT Percentage (including “%”)
MONEY Monetary values, including unit
QUANTITY Measurements, as of weight or distance
ORDINAL “first”, “second”
CARDINAL Numerals that do not fall under another typ
But there has to be an easier way of getting this.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
For anyone still looking for this list, the english ones are listed here.
If you want to view it for your current model, they appear to be stored in the model's
entity.cfgattribute. Namely:If you add a new label, it is stored under entity.cfg['extra_labels']