Rasa: ner_crf delivers hyphen instead of entity type

Created on 19 Jul 2017  路  6Comments  路  Source: RasaHQ/rasa

rasa NLU version: 0.10.0a0 and also 0.9

Used backend / pipeline: spacy_sklearn

Operating system : macOS Sierra

Issue:
Hey guys,
I am training a model with the pipeline "spacy_sklearn". The training set is about 1000 utterances with five different entity types.
I did the whole installation process you described in the documentation. However, when I'm testing the model, it detects the different entities without delivering the entity type (it delivers '-'). I tried a lot of different configurations, but none of them worked for me.
In case I add the "entity_crf_BILOU_flag" and set it to true I don't get any entities delivered. The same happens when I remove "entity_crf_features". To me, it seems like I configured it in the wrong way because it detects the correct value.

Thanks guys!

Content of reply file:

{
 "text": "show information for cloud computing", 
        "rasa_reply": {
            "entities": [
                {
                    "start": 21, 
                    "entity": "-", 
                    "end": 26, 
                    "value": "cloud", 
                    "extractor": "ner_crf"
                }
            ]
        }
} 

Content of configuration file:

{
  "pipeline": "spacy_sklearn",
  "path" : "../models/spacy_sklearn_1000_19-07-2017",
  "data" : "../set/trainandtest/rasa/train/1000/",
  "entity_crf_features": [
    ["low", "title", "upper", "pos", "pos2"],
    ["bias", "low", "word3", "word2", "upper", "title", "digit", "pos", "pos2", "pattern"],
    ["low", "title", "upper", "pos", "pos2"]]
}

type

All 6 comments

Can you provide a sample of your training data?

That's one of the utterances (training data):

{
         "text": "book conference room close to MTY20 next monday at 7 am all day", 
         "intent": "BookRoom", 
         "entities": [
                    {
                        "end": 34, 
                        "value": "MTY20", 
                        "entity": "building", 
                        "start": 30
                    }
               ]
}

I don't know if this is the problem, but I could believe that it is. The end position isn't correct. should be 35.

You may have seen a warning of this in the console or log output. Not in a place to confirm that at the moment.

@foldingparrot Do you mind sending me the whole training data at [email protected]? I think I saw this before and looking at the dataset might quickly resolve it (I think it is an issue in the training data)

I think @wrathagom might be right here - I think the cause can be wrong entity offsets (so start and end values of the entitiy). I am not sure we warn about that, but we definitely should

You guys are right. I fixed that and it works as expected now!

Thanks a lot!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

karnigili picture karnigili  路  3Comments

nahidalam picture nahidalam  路  3Comments

Jasperty picture Jasperty  路  3Comments

Poojan66 picture Poojan66  路  3Comments

lomarceau picture lomarceau  路  3Comments