Rasa: NLU model does not recognize anything after upgrade to 1.3.2 from 1.2.5 for intent with underscore in name

Created on 11 Sep 2019  路  8Comments  路  Source: RasaHQ/rasa

Rasa version: 1.3.2

Python version: 3.6

Operating system (windows, osx, ...): Ubuntu

Issue:
After updating to Rasa 1.3.2 and retraining using rasa train, the NLU model does not recognize anything. Intents all have the same confidence of 0.024. Nothing has changed in the training data, domain, or config. I also tried to train only the NLU model and it worked fine.

During training, it founds 1647 intent examples (which is correct) and finishes with an accuracy of 0.994. Entities are correctly extracted and the core model seems to work just fine. I have added the training logs below.

Any idea what could happen and how we could properly train our NLU model?

Command or request that led to error:
Typing who are you? in the rasa shell nlu gives:

{
  "intent": {
    "name": "ask_pictures",
    "confidence": 0.024387534707784653
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "ask_pictures",
      "confidence": 0.024387534707784653
    },
    {
      "name": "ask_talk_to",
      "confidence": 0.024387534707784653
    },
    ...
  ],
  "text": "who are you?"
}

Training output

Training NLU model...
2019-09-11 11:50:18 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en'
2019-09-11 11:50:54 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en'.
2019-09-11 11:50:55 INFO     rasa.nlu.training_data.training_data  - Training data stats:
        - intent examples: 1647 (52 distinct intents)
        - Found intents: 'ask_identity', 'affirm', 'dont_know', [...]
        - Number of response examples: 0 (0 distinct response)
        - entity examples: 435 (16 distinct entities)
        - found entities: 'location', ...

2019-09-11 11:50:55 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component SpacyEntityExtractor
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2019-09-11 11:51:02 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:02 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Starting to train component EmbeddingIntentClassifier
Epochs: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 300/300 [02:11<00:00,  2.28it/s, loss=0.543, acc=0.994]
2019-09-11 11:53:25 INFO     rasa.utils.train_utils  - Finished training embedding policy, train loss=0.543, train accuracy=0.994
2019-09-11 11:53:25 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:53:25 INFO     rasa.nlu.model  - Starting to train component DucklingHTTPExtractor
2019-09-11 11:53:25 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:53:26 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmpdivjkb3c/nlu'
NLU model training completed.

Content of configuration file (config.yml) (if relevant):

language: "en"
pipeline:
  - name: "SpacyNLP"
  - name: "SpacyTokenizer"
  - name: "SpacyEntityExtractor"
    dimensions: ["PERSON", ...]
  - name: "RegexFeaturizer"
  - name: "SpacyFeaturizer"
  - name: "CRFEntityExtractor"
    features: [
               ...
              ]
  - name: "EntitySynonymMapper"
  - name: "EmbeddingIntentClassifier"
  - name: "DucklingHTTPExtractor"
    url: "http://duckling.alpaca.casa"
    dimensions: ["time", ...]
    timezone: "America/New_York"
area type

Most helpful comment

it is the problem with featurization of intents that we introduce in 1.3, we're working on fixing it

All 8 comments

The confusion matrix looks strange as well. A few intents are actually correctly detected.

image

The intent_report.json gives:

{
  ...,
  "micro avg": {
    "precision": 0.24097859327217125,
    "recall": 0.2392228293867638,
    "f1-score": 0.2400975015234613,
    "support": 1647
  },
  "macro avg": {
    "precision": 0.2115333312135587,
    "recall": 0.22926712590891699,
    "f1-score": 0.21212947579930622,
    "support": 1647
  },
  "weighted avg": {
    "precision": 0.20125352874473615,
    "recall": 0.2392228293867638,
    "f1-score": 0.20300657350876083,
    "support": 1647
  }
}

Thanks for the issue, @federicotdn will get back to you about it soon!

You may find help in the docs and the forum, too 馃

@federicotdn @sara-tagger we found the pattern. All of the intents that have an underscore in their names cannot be distinguished.

The following intents were correctly classified:

  • affirm
  • deny
  • explain
  • goodbye
  • greeting
  • inform
  • interested
  • joke
  • reset
  • thanks
  • useful

The following were wrongly classified:

  • dont_know
  • i_love_you
  • i_hate_you
  • ask_pictures
  • ask_identity
  • etc.

In total, we have 41 intents with an underscore in the name. The confidence shown with the NLU model for these intents is always equal to 1/41 (see in the original post).

Could you please try to exclude spacy from pipeline and retrain? I think there is a bug

@Ghostvv I tried a few pipelines and with the following, it works for intents with underscores. It seems that indeed the SpacyFeaturizer is the culprit.

language: "en"
pipeline:
  - name: "SpacyNLP"
  - name: "SpacyTokenizer"
  - name: "SpacyEntityExtractor"
    dimensions: [...]
  - name: "RegexFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "CRFEntityExtractor"
    features: [...]
  - name: "EntitySynonymMapper"
  - name: "EmbeddingIntentClassifier"
  - name: "DucklingHTTPExtractor"
    url: "http://duckling.alpaca.casa"
    dimensions: [...]
    timezone: "America/New_York"

Also, please note that I tried to change underscores by dots with the original pipeline and it didn't work either.

it is the problem with featurization of intents that we introduce in 1.3, we're working on fixing it

should be fixed by PR above

Yes, works fine now in 1.3.3. Thanks for the quick fix!

Was this page helpful?
0 / 5 - 0 ratings