rasa NLU version 0.8.6
"pipeline": ["nlp_spacy", "ner_crf", "ner_synonyms", "intent_featurizer_spacy", "intent_classifier_sklearn"],
Operating system osx
Issue:
in text that has more than one entity, except of the last one, all the rest are shown as "-" instead of its actual entity . ideas ?
{u'entities': [{u'end': 14,
u'entity': '-',
u'extractor': u'ner_crf',
u'start': 11,
u'value': u'747'},
{u'end': 21,
u'entity': '-',
u'extractor': u'ner_crf',
u'start': 15,
u'value': u'kansas'},
{u'end': 24,
u'entity': '-',
u'extractor': u'ner_crf',
u'start': 22,
u'value': u'st'},
{u'end': 34,
u'entity': 'time',
u'extractor': u'ner_crf',
u'start': 28,
u'value': u'4:10pm'}],
Thank you
Content of configuration file (if used & relevant):
That seems odd. I think to reproduce this, it would be very helpful to access your training data. Please send me the configuration file and the data to [email protected] if possible.
@karnigili can you please also tell me the sentence of the above example?
sure @tmbo
"see you at 747 kansas st at 4:10pm".
also, I sent the mail with the files
Ok the issue is caused due to whitespaces in your entity annotations. Have a look at you training data - some of the value fields of your annotation contain trailing whitespaces, you need to remove them otherwise they don't align with the tokenization.
We can't really fix this - but I will make sure we show a warning about non-aligned tokens instead of the dashes.
Hi @tmbo , thank you!
Is there anything else that would cause that?
I have strip() and adjust the range for all entities yet it is not being resolved.
thank you
If you send me the modified training data I am able to take another look.
@karnigili There are still entities in there that have a value ending with a space, e.g.:
{
"start": 13,
"end": 23,
"value": "documents ",
"entity": "docs"
}
I'd suggest to install the latest master and it will print warnings for every non aligned entity value.
Great ! thank you so much :):)