Working with the latest version of rasa_nlu, I'm having a problem where synonyms defined by "entity_synonyms" don't return a match. My training data looks as follows:
{
"rasa_nlu_data": {
"entity_synonyms": [
{
"value": "coffee",
"synonyms": ["covfefe"]
}
],
"common_examples": [
{
"text": "would like coffee",
"intent": "order",
"entities": [
{
"start": 11,
"end": 17,
"value": "coffee",
"entity": "item"
}
]
},
{
"text": "could have coffee",
"intent": "order",
"entities": [
{
"start": 11,
"end": 17,
"value": "coffee",
"entity": "item"
}
]
},
{
"text": "please have coffee",
"intent": "order",
"entities": [
{
"start": 12,
"end": 18,
"value": "coffee",
"entity": "item"
}
]
}
}
When I send please have coffee, then an item of the value coffee is identified. But when I enter please have covfefe, I don't get a match, even though covfefe is set to be a synonym.
BUT if I add training data for "covfefe" like so:
{
"text": "please have coffee",
"intent": "order",
"entities": [
{
"start": 12,
"end": 18,
"value": "coffee",
"entity": "item"
}
]
}
I DO get a match - with processor ["ner_synonyms"].
So synonyms do seem to be working, but setting them via a entity_synonyms object doesn't work.
I understand how this is confusing, but it's actually expected behaviour. The synonyms only map to a particular value once they have been recognised as entities. You will still have to add some examples with e.g. covfefe marked as an entity.
If you're up for creating a PR to make the docs clearer on this that would be 馃挴
Thank you for the quick reply. May I ask, then, what the point is of defining synonyms by entity_synonyms? Is it only to get the processor ["ner_synonyms"] prop in the reply, or are there any other benefits? As far as I can tell, additionally defining entity_synonyms doesn't change the result of the output when I add the synonyms to common_examples array anyway to get a match.
I'll gladly update the docs and contribute as soon as I'm clear on the benefits. Thank you!
I am the one that added the note to the docs under the entity synonyms section here.
But I still struggle to explain how this works. In the common_examples section of the training data if you label a section of the text as an entity then that is fed into training an entity recognition model. Only the examples in the common_examples section are fed into the model training. So since you only provided examples with an entity value of _coffee_ the model has not generalized that the item entity can have more values than just coffee. When you add the _covfefe_ example into the common_examples section then it is successfully parsed as an entity by the model.
Once _coffee_ or _covfefe_ are recognized as entity values THEN entity synonyms come into play. In this case they say _covfefe_ is a synonym of _coffee_ so I am going to replace the synonym _covfefe_ with it's defined value _coffee_.
Said another way expected out put for the request Please have covfefe:
With entity_synonyms:
{
"entities": [
{
"extractor": "ner_crf",
"end": 19,
"processors": [
"ner_synonyms"
],
"value": "coffee",
"entity": "item",
"start": 12
}
],
"intent": null,
"text": "Please have covfefe",
"intent_ranking": []
}
Notice how the user asked for _covfefe_, but the entity value returned was _coffee_, this is because it was processed by ner_synonyms.
Without entity_synonyms
{
"entities": [
{
"extractor": "ner_crf",
"end": 19,
"value": "covfefe",
"entity": "item",
"start": 12
}
],
"intent": null,
"text": "Please have covfefe",
"intent_ranking": []
}
Notice with synonyms the actual parsed entity value of _covfefe_ is returned.
Also @jonasblumer check out https://github.com/RasaHQ/rasa_nlu/issues/773
Thank you for the detailed answers! It does seem to me that the docs could be more specific.
So the following two examples will return the same result:
{
"rasa_nlu_data": {
"entity_synonyms": [
{
"value": "coffee",
"synonyms": ["covfefe"]
}
],
"common_examples": [
{
"text": "would like covfefe",
"intent": "order",
"entities": [
{
"start": 11,
"end": 17,
"value": "covfefe",
"entity": "item"
}
]
}
}
this will return a match with value of coffee because of the entity_synonyms-mapping. notice that in the common examples, the value is covfefe.
AND
{
"rasa_nlu_data": {
"common_examples": [
{
"text": "would like covfefe",
"intent": "order",
"entities": [
{
"start": 11,
"end": 17,
"value": "coffee",
"entity": "item"
}
]
},
{
"text": "would like coffee",
"intent": "order",
"entities": [
{
"start": 11,
"end": 17,
"value": "coffee",
"entity": "item"
}
]
}
}
will return the same thing, as the value of both entities is coffee. no need for using entity_synonyms here.
In my current understanding, these two examples are absolutely equal.
Is that correct? If yes, I will gladly try to make this more clear in an PR to update the docs.
yes, the entity_synonyms just provides a place where more synonyms can be defined in a smaller space. Granted that there still have to be enough examples in the common_examples section to generalize and recognize them.
@jonasblumer I am going to close this one, but please do submit a PR. Also, let me know if your issue isn't resolved.
The ultimative power of entity synonyms comes together with the prhase matcher! I just played with phrase matcher and did it before NER in the pipleine such that first untrained entities like item are recognized, afterwards cofeve is replaced to coffee with entity_synonyms! And you don'tneed to train cofeve!
Rasa version: Rasa 1.6.0
Rasa SDK version (if used & relevant):
Rasa X version (if used & relevant):
Python version:python3.6.9
Operating system (windows, osx, ...):ubuntu 18.04 LTS
Issue: Failed load nlu model while starting rasa shell to test my bot:
nlu and stories are correct and tested with embedded supervised
![Uploading starter.png鈥()
Error (including full traceback):
2020-02-06 21:39:29 INFO root - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2020-02-06 21:39:29 INFO root - Starting Rasa server on http://localhost:5005
2020-02-06 21:39:32 INFO absl - Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]
/home/ai/ai/rasa/o/lib/python3.6/site-packages/rasa/nlu/classifiers/embedding_intent_classifier.py:962: UserWarning: Failed to load nlu model. Maybe path '/tmp/tmpwistue_9/nlu' doesn't exist.
f"Failed to load nlu model. "
2020-02-06 21:39:33 INFO rasa.nlu.selectors.embedding_response_selector - Retrieval intent parameter was left to its default value. This response selector will be trainedon training examples combining all retrieval intents.
Bot loaded. Type a message and press enter (use '/stop' to exit):
Your input -> tell me location
2020-02-06 21:39:57 ERROR rasa.nlu.classifiers.embedding_intent_classifier - **There is no trained tf.session: component is either not trained or didn't receive enough training data.**
Your input -> /stop
2020-02-06 21:41:47 INFO root - Killing Sanic server now.
Command or request that led to error:
$ rasa shell
Content of configuration file (config.yml) (if relevant):
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
- name: "WhitespaceTokenizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: "EmbeddingIntentClassifier"
- name: "ResponseSelector"
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: KerasPolicy
- name: MappingPolicy
Content of domain file (domain.yml) (if relevant):
intents:
- greet
- goodbye
- query_knowledge_base
- bot_challenge
- location_ask
- time_t
- who_ask
entities:
- location
- address
- berlin
- date
- time
- services
actions:
- utter_iamabot
- utter_greet
- utter_goodbye
- utter_ask_rephrase
- action_location
- action_time
templates:
utter_greet:
- text: "Hey!"
- text: "Hello! How can I help you?"
utter_goodbye:
- text: "Bye"
- text: "Goodbye. See you soon."
utter_ask_rephrase:
- text: "Sorry, I'm not sure I understand. Can you rephrase?"
- text: "Can you please rephrase? I did not got that."
utter_iamabot:
- text: "I am a bot, powered by Rasa."