Rasa: entity_synonyms not recognised

Created on 17 Jan 2018  路  9Comments  路  Source: RasaHQ/rasa

Working with the latest version of rasa_nlu, I'm having a problem where synonyms defined by "entity_synonyms" don't return a match. My training data looks as follows:

{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "coffee",
        "synonyms": ["covfefe"]
      }
    ],
    "common_examples": [
      {
        "text": "would like coffee",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      },
      {
        "text": "could have coffee",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      },
      {
        "text": "please have coffee",
        "intent": "order",
        "entities": [
          {
            "start": 12,
            "end": 18,
            "value": "coffee",
            "entity": "item"
          }
        ]
      }
}

When I send please have coffee, then an item of the value coffee is identified. But when I enter please have covfefe, I don't get a match, even though covfefe is set to be a synonym.

BUT if I add training data for "covfefe" like so:

{
        "text": "please have coffee",
        "intent": "order",
        "entities": [
          {
            "start": 12,
            "end": 18,
            "value": "coffee",
            "entity": "item"
          }
        ]
}

I DO get a match - with processor ["ner_synonyms"].

So synonyms do seem to be working, but setting them via a entity_synonyms object doesn't work.

type type

All 9 comments

I understand how this is confusing, but it's actually expected behaviour. The synonyms only map to a particular value once they have been recognised as entities. You will still have to add some examples with e.g. covfefe marked as an entity.

If you're up for creating a PR to make the docs clearer on this that would be 馃挴

Thank you for the quick reply. May I ask, then, what the point is of defining synonyms by entity_synonyms? Is it only to get the processor ["ner_synonyms"] prop in the reply, or are there any other benefits? As far as I can tell, additionally defining entity_synonyms doesn't change the result of the output when I add the synonyms to common_examples array anyway to get a match.

I'll gladly update the docs and contribute as soon as I'm clear on the benefits. Thank you!

I am the one that added the note to the docs under the entity synonyms section here.

But I still struggle to explain how this works. In the common_examples section of the training data if you label a section of the text as an entity then that is fed into training an entity recognition model. Only the examples in the common_examples section are fed into the model training. So since you only provided examples with an entity value of _coffee_ the model has not generalized that the item entity can have more values than just coffee. When you add the _covfefe_ example into the common_examples section then it is successfully parsed as an entity by the model.

Once _coffee_ or _covfefe_ are recognized as entity values THEN entity synonyms come into play. In this case they say _covfefe_ is a synonym of _coffee_ so I am going to replace the synonym _covfefe_ with it's defined value _coffee_.

Said another way expected out put for the request Please have covfefe:

With entity_synonyms:

{
    "entities": [
        {
            "extractor": "ner_crf",
            "end": 19,
            "processors": [
                "ner_synonyms"
            ],
            "value": "coffee",
            "entity": "item",
            "start": 12
        }
    ],
    "intent": null,
    "text": "Please have covfefe",
    "intent_ranking": []
}

Notice how the user asked for _covfefe_, but the entity value returned was _coffee_, this is because it was processed by ner_synonyms.

Without entity_synonyms

{
    "entities": [
        {
            "extractor": "ner_crf",
            "end": 19,
            "value": "covfefe",
            "entity": "item",
            "start": 12
        }
    ],
    "intent": null,
    "text": "Please have covfefe",
    "intent_ranking": []
}

Notice with synonyms the actual parsed entity value of _covfefe_ is returned.

Also @jonasblumer check out https://github.com/RasaHQ/rasa_nlu/issues/773

Thank you for the detailed answers! It does seem to me that the docs could be more specific.
So the following two examples will return the same result:

{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "coffee",
        "synonyms": ["covfefe"]
      }
    ],
    "common_examples": [
      {
        "text": "would like covfefe",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "covfefe",
            "entity": "item"
          }
        ]
      }
}

this will return a match with value of coffee because of the entity_synonyms-mapping. notice that in the common examples, the value is covfefe.

AND

{
  "rasa_nlu_data": {
    "common_examples": [
      {
        "text": "would like covfefe",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      },
 {
        "text": "would like coffee",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      }
}

will return the same thing, as the value of both entities is coffee. no need for using entity_synonyms here.

In my current understanding, these two examples are absolutely equal.

Is that correct? If yes, I will gladly try to make this more clear in an PR to update the docs.

yes, the entity_synonyms just provides a place where more synonyms can be defined in a smaller space. Granted that there still have to be enough examples in the common_examples section to generalize and recognize them.

@jonasblumer I am going to close this one, but please do submit a PR. Also, let me know if your issue isn't resolved.

The ultimative power of entity synonyms comes together with the prhase matcher! I just played with phrase matcher and did it before NER in the pipleine such that first untrained entities like item are recognized, afterwards cofeve is replaced to coffee with entity_synonyms! And you don'tneed to train cofeve!

Facing issue with pipline while training bots

Rasa version: Rasa 1.6.0

Rasa SDK version (if used & relevant):

Rasa X version (if used & relevant):

Python version:python3.6.9

Operating system (windows, osx, ...):ubuntu 18.04 LTS

Issue: Failed load nlu model while starting rasa shell to test my bot:

nlu and stories are correct and tested with embedded supervised 
![Uploading starter.png鈥()

Error (including full traceback):

2020-02-06 21:39:29 INFO     root  - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2020-02-06 21:39:29 INFO     root  - Starting Rasa server on http://localhost:5005
2020-02-06 21:39:32 INFO     absl  - Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]
/home/ai/ai/rasa/o/lib/python3.6/site-packages/rasa/nlu/classifiers/embedding_intent_classifier.py:962: UserWarning: Failed to load nlu model. Maybe path '/tmp/tmpwistue_9/nlu' doesn't exist.
  f"Failed to load nlu model. "
2020-02-06 21:39:33 INFO     rasa.nlu.selectors.embedding_response_selector  - Retrieval intent parameter was left to its default value. This response selector will be trainedon training examples combining all retrieval intents.
Bot loaded. Type a message and press enter (use '/stop' to exit): 
Your input ->  tell me location                                                 
2020-02-06 21:39:57 ERROR    rasa.nlu.classifiers.embedding_intent_classifier  - **There is no trained tf.session: component is either not trained or didn't receive enough training data.**
Your input ->  /stop                                                            
2020-02-06 21:41:47 INFO     root  - Killing Sanic server now.

Command or request that led to error:

$ rasa shell 

Content of configuration file (config.yml) (if relevant):


# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
  - name: "WhitespaceTokenizer"
  - name: "RegexFeaturizer"
  - name: "CRFEntityExtractor"
  - name: "EntitySynonymMapper"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "EmbeddingIntentClassifier"
  - name: "ResponseSelector"


# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy

Content of domain file (domain.yml) (if relevant):

intents:
  - greet
  - goodbye
  - query_knowledge_base
  - bot_challenge
  - location_ask
  - time_t
  - who_ask

entities:
  - location  
  - address 
  - berlin 
  - date
  - time
  - services

actions:
- utter_iamabot
- utter_greet
- utter_goodbye
- utter_ask_rephrase
- action_location
- action_time

templates:
  utter_greet:
  - text: "Hey!"
  - text: "Hello! How can I help you?"

  utter_goodbye:
  - text: "Bye"
  - text: "Goodbye. See you soon."

  utter_ask_rephrase:
  - text: "Sorry, I'm not sure I understand. Can you rephrase?"
  - text: "Can you please rephrase? I did not got that."

 utter_iamabot:
  - text: "I am a bot, powered by Rasa."
Was this page helpful?
0 / 5 - 0 ratings