Rasa: multiple entities foe example ner_crf

Created on 7 Jun 2017 · 8Comments · Source: RasaHQ/rasa

rasa NLU version 0.8.6

"pipeline": ["nlp_spacy", "ner_crf", "ner_synonyms", "intent_featurizer_spacy", "intent_classifier_sklearn"],

Operating system osx

Issue:
in text that has more than one entity, except of the last one, all the rest are shown as "-" instead of its actual entity . ideas ?

{u'entities': [{u'end': 14,
u'entity': '-',
u'extractor': u'ner_crf',
u'start': 11,
u'value': u'747'},
{u'end': 21,
u'entity': '-',
u'extractor': u'ner_crf',
u'start': 15,
u'value': u'kansas'},
{u'end': 24,
u'entity': '-',
u'extractor': u'ner_crf',
u'start': 22,
u'value': u'st'},
{u'end': 34,
u'entity': 'time',
u'extractor': u'ner_crf',
u'start': 28,
u'value': u'4:10pm'}],
Thank you

Content of configuration file (if used & relevant):

type

Source

karnigili

All 8 comments

That seems odd. I think to reproduce this, it would be very helpful to access your training data. Please send me the configuration file and the data to [email protected] if possible.

tmbo on 7 Jun 2017

👍1

@karnigili can you please also tell me the sentence of the above example?

tmbo on 7 Jun 2017

sure @tmbo
"see you at 747 kansas st at 4:10pm".

also, I sent the mail with the files

karnigili on 7 Jun 2017

Ok the issue is caused due to whitespaces in your entity annotations. Have a look at you training data - some of the value fields of your annotation contain trailing whitespaces, you need to remove them otherwise they don't align with the tokenization.

We can't really fix this - but I will make sure we show a warning about non-aligned tokens instead of the dashes.

tmbo on 8 Jun 2017

Hi @tmbo , thank you!

Is there anything else that would cause that?
I have strip() and adjust the range for all entities yet it is not being resolved.

thank you

karnigili on 8 Jun 2017

If you send me the modified training data I am able to take another look.

tmbo on 8 Jun 2017

👍1

@karnigili There are still entities in there that have a value ending with a space, e.g.:

         {
            "start": 13,
            "end": 23,
            "value": "documents ",
            "entity": "docs"
          }

I'd suggest to install the latest master and it will print warnings for every non aligned entity value.

tmbo on 9 Jun 2017

Great ! thank you so much :):)

karnigili on 9 Jun 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

rasa_core.policies.ensemble.InvalidPolicyConfig: You didn't define any policies. Please define them under 'policies:' in your policy configuration file.

Arghya999 · 3Comments

Rasa training is very slow due to excessive copy of the domain, fails on machine with low memory.

edouardlp · 3Comments

Not able to replicate embedding_intent_classifier performance with DIET

lomarceau · 3Comments

re-run rasa_nlu.server

karnigili · 3Comments

DIET classifier _predict_entities function clean_up_entities for Chinese language issue

johnson7788 · 3Comments