Rasa: NLU model does not recognize anything after upgrade to 1.3.2 from 1.2.5 for intent with underscore in name

Created on 11 Sep 2019 · 8Comments · Source: RasaHQ/rasa

Rasa version: 1.3.2

Python version: 3.6

Operating system (windows, osx, ...): Ubuntu

Issue:
After updating to Rasa 1.3.2 and retraining using rasa train, the NLU model does not recognize anything. Intents all have the same confidence of 0.024. Nothing has changed in the training data, domain, or config. I also tried to train only the NLU model and it worked fine.

During training, it founds 1647 intent examples (which is correct) and finishes with an accuracy of 0.994. Entities are correctly extracted and the core model seems to work just fine. I have added the training logs below.

Any idea what could happen and how we could properly train our NLU model?

Command or request that led to error:
Typing who are you? in the rasa shell nlu gives:

{
  "intent": {
    "name": "ask_pictures",
    "confidence": 0.024387534707784653
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "ask_pictures",
      "confidence": 0.024387534707784653
    },
    {
      "name": "ask_talk_to",
      "confidence": 0.024387534707784653
    },
    ...
  ],
  "text": "who are you?"
}

Training output

Training NLU model...
2019-09-11 11:50:18 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en'
2019-09-11 11:50:54 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en'.
2019-09-11 11:50:55 INFO     rasa.nlu.training_data.training_data  - Training data stats:
        - intent examples: 1647 (52 distinct intents)
        - Found intents: 'ask_identity', 'affirm', 'dont_know', [...]
        - Number of response examples: 0 (0 distinct response)
        - entity examples: 435 (16 distinct entities)
        - found entities: 'location', ...

2019-09-11 11:50:55 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component SpacyEntityExtractor
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:01 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2019-09-11 11:51:02 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:02 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:51:10 INFO     rasa.nlu.model  - Starting to train component EmbeddingIntentClassifier
Epochs: 100%|██████████████████████████████████████████████████| 300/300 [02:11<00:00,  2.28it/s, loss=0.543, acc=0.994]
2019-09-11 11:53:25 INFO     rasa.utils.train_utils  - Finished training embedding policy, train loss=0.543, train accuracy=0.994
2019-09-11 11:53:25 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:53:25 INFO     rasa.nlu.model  - Starting to train component DucklingHTTPExtractor
2019-09-11 11:53:25 INFO     rasa.nlu.model  - Finished training component.
2019-09-11 11:53:26 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmpdivjkb3c/nlu'
NLU model training completed.

Content of configuration file (config.yml) (if relevant):

language: "en"
pipeline:
  - name: "SpacyNLP"
  - name: "SpacyTokenizer"
  - name: "SpacyEntityExtractor"
    dimensions: ["PERSON", ...]
  - name: "RegexFeaturizer"
  - name: "SpacyFeaturizer"
  - name: "CRFEntityExtractor"
    features: [
               ...
              ]
  - name: "EntitySynonymMapper"
  - name: "EmbeddingIntentClassifier"
  - name: "DucklingHTTPExtractor"
    url: "http://duckling.alpaca.casa"
    dimensions: ["time", ...]
    timezone: "America/New_York"

area type

Source

nbeuchat

Most helpful comment

it is the problem with featurization of intents that we introduce in 1.3, we're working on fixing it

Ghostvv on 11 Sep 2019

❤1 👍1

All 8 comments

The confusion matrix looks strange as well. A few intents are actually correctly detected.

The intent_report.json gives:

{
  ...,
  "micro avg": {
    "precision": 0.24097859327217125,
    "recall": 0.2392228293867638,
    "f1-score": 0.2400975015234613,
    "support": 1647
  },
  "macro avg": {
    "precision": 0.2115333312135587,
    "recall": 0.22926712590891699,
    "f1-score": 0.21212947579930622,
    "support": 1647
  },
  "weighted avg": {
    "precision": 0.20125352874473615,
    "recall": 0.2392228293867638,
    "f1-score": 0.20300657350876083,
    "support": 1647
  }
}

nbeuchat on 11 Sep 2019

Thanks for the issue, @federicotdn will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

sara-tagger on 11 Sep 2019

👍1

@federicotdn @sara-tagger we found the pattern. All of the intents that have an underscore in their names cannot be distinguished.

The following intents were correctly classified:

affirm
deny
explain
goodbye
greeting
inform
interested
joke
reset
thanks
useful

The following were wrongly classified:

dont_know
i_love_you
i_hate_you
ask_pictures
ask_identity
etc.

In total, we have 41 intents with an underscore in the name. The confidence shown with the NLU model for these intents is always equal to 1/41 (see in the original post).

nbeuchat on 11 Sep 2019

Could you please try to exclude spacy from pipeline and retrain? I think there is a bug

Ghostvv on 11 Sep 2019

@Ghostvv I tried a few pipelines and with the following, it works for intents with underscores. It seems that indeed the SpacyFeaturizer is the culprit.

language: "en"
pipeline:
  - name: "SpacyNLP"
  - name: "SpacyTokenizer"
  - name: "SpacyEntityExtractor"
    dimensions: [...]
  - name: "RegexFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "CRFEntityExtractor"
    features: [...]
  - name: "EntitySynonymMapper"
  - name: "EmbeddingIntentClassifier"
  - name: "DucklingHTTPExtractor"
    url: "http://duckling.alpaca.casa"
    dimensions: [...]
    timezone: "America/New_York"

Also, please note that I tried to change underscores by dots with the original pipeline and it didn't work either.

nbeuchat on 11 Sep 2019

it is the problem with featurization of intents that we introduce in 1.3, we're working on fixing it

Ghostvv on 11 Sep 2019

❤1 👍1

should be fixed by PR above

Ghostvv on 16 Sep 2019

Yes, works fine now in 1.3.3. Thanks for the quick fix!

nbeuchat on 16 Sep 2019

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

2.0.0rc4 clean install: Conflicting dependencies and VisibleDeprecationWarnings

N-Olbert · 3Comments

Rasa training is very slow due to excessive copy of the domain, fails on machine with low memory.

edouardlp · 3Comments

Exception: Not all required packages are installed. To use this pipeline, you need to install the missing dependencies. Please install sklearn

irfan-zoefit · 3Comments

Multiple Choice buttons in rasa

nikhilcss97 · 3Comments

Not able to replicate embedding_intent_classifier performance with DIET

lomarceau · 3Comments