Rasa: interpreting random sequences of characters

Created on 22 May 2018 · 22Comments · Source: RasaHQ/rasa

Rasa NLU version:
commit 3d736227e3439ab745126263e8431e875b1882cd

Operating system (windows, osx, ...):
linux

Issue:
Hallo,

I have trained a model (intent_featurizer_count_vectors+intent_classifier_tensorflow_embedding) on some German sentences.

While testing the reaction to invalid input, I realized that the intepreter apparently "understands" also random sequences of characters. "fgergq" for example, is assigned an intent with fairly high confidence (65%). "RFSSSw" reaches 96%.

"osifhaskdnkauer" is recognised as greet with 96% confidence, and this is particularly interesting since my "greet" training sentences are the ones you would expect (along the lines of "Hallo", "Good *"), and also mostly very short sentences.

The only way I found to actually get back an empty intent with zero confidence is to parse an empty string. Even single-characters never occurring in the training are interpreted: "j" gets intent "bye" with 80% confidence, while to be fair it would be a lot closer to "ja", one of my "affirm" examples.

I have two questions:

1) for my education, what is actually happening here? How does rasa_nlu+tensorflow_embedding evaluate the intent of a single word never seen before?

2) how do I protect a chatbot from such things? Downstream, the Fallback policy in rasa_core uses the intent confidence to decide whether the user input is valid or not, but there is not much I can do against a 95% confidence from rasa_nlu, so I am guessing there is something I need to tune on the NLU side. Any suggestions?

Thanks for your help,

Andrea.

type type

Source

disimone

All 22 comments

I should add that I am using the tensorflow_embedding with the default parameters provided in the rasa_nlu docs:

- name: "intent_classifier_tensorflow_embedding"
  # nn architecture
  "num_hidden_layers_a": 2
  "hidden_layer_size_a": [256, 128]
  "num_hidden_layers_b": 0
  "hidden_layer_size_b": []
  "batch_size": 32
  "epochs": 300
  # embedding parameters
  "embed_dim": 10
  "mu_pos": 0.8  # should be 0.0 < ... < 1.0 for 'cosine'
  "mu_neg": -0.4  # should be -1.0 < ... < 1.0 for 'cosine'
  "similarity_type": "cosine"  # string 'cosine' or 'inner'
  "num_neg": 10
  "use_max_sim_neg": true  # flag which loss function to use
  # regularization
  "C2": 0.002
  "C_emb": 0.8
  "droprate": 0.2
  # flag if to tokenize intents
  "intent_tokenization_flag": false
  "intent_split_symbol": "_"

disimone on 22 May 2018

Thanks for noticing this behavior.

Current tensorflow pipeline ignores the word not seen during training. But embedding layers use biases, so empty input should basically output learned biases. I'm afraid not much can be done right now by tuning NLU.

We'll take a look into this problem in more details.

Ghostvv on 22 May 2018

Hi @Ghostvv ,

thanks a lot for your answer. I guess if the new words are ignored, one would expect the results to be the same as the empty string, which is not the case. Unless maybe you are intercepting the empty string before it is run through the embedding? Maybe an early return?

In any case, thanks for looking into this.

Cheers,

Andrea.

disimone on 22 May 2018

@disimone, yes, empty string get intercepted by interpreter.

Could you please add a bit more details:

which featurizer do you use?
do you get the same intent for any random input for the same training run?

Ghostvv on 22 May 2018

Hi @Ghostvv ,

I am using "intent_featurizer_count_vectors" and "intent_entity_featurizer_regex". The regex is only catching the "Good *" greetings.

I noticed only now that "intent_featurizer_spacy" is also still activated in my config.yml, a leftover from the default spacy pipeline, but I am not sure if this is causing the problem.

The random strings are assigned different intents. For example "j" is "bye", "s" is "negate", "w" is "provide_info".

I can't really share the trained model, nor the training data, due to data protection issues, but I am happy to run more tests, if you need furhter info.

Thanks for your help,

Andrea.

disimone on 22 May 2018

@disimone thank you for details. Could you please run only "intent_featurizer_count_vectors" without spacy? Because, if you have spacy in the pipeline, "intent_featurizer_count_vectors" uses lemma_ form spacy, so it might provide some lemmatization for single letters, but i'm not sure in this case.

It would be helpful, if you could paste your nlu pipeline config file config.yml

Ghostvv on 22 May 2018

@Ghostvv thanks for the suggestion. I turned off both spacy and regex featurizers, just to be sure.
so now I only have the count_vectors in.

Unfortunately I still observe the same behavior as before, but with an interesting variation: now all random strings are assigned the same intent ("greet") with exactly the same confidence:

In [9]: interpreter.parse("dtgsdhfjhfdhj")
Out[9]:
{'entities': [],
 'intent': {'confidence': 0.6623044013977051, 'name': 'greet'},
 'intent_ranking': [{'confidence': 0.6623044013977051, 'name': 'greet'},
  {'confidence': 0.5363500118255615, 'name': 'info_provide'},
  {'confidence': 0.15859365463256836, 'name': 'bye'},
  {'confidence': 0.14967912435531616, 'name': 'thankyou'},
  {'confidence': -0.09815622866153717, 'name': 'confirm'},
  {'confidence': -0.1715802550315857, 'name': 'negate'},

Before, different strings got different intents with different confidences. Now different random strings get the same main intent, confidence, and ranking.

I admit I am not sure what to make of the negative confidences, though.

HTH,

Andrea.

disimone on 22 May 2018

@disimone thanks for checking, that's exactly what I wanted to check.

Negative confidence means that cosine similarity is negative, because this algorithm is not a classifier that assign probabilities, but a ranker that measure the similarity between input and label embeddings.

Could you please provide your previous (with spacy) full config.yml?

Ghostvv on 22 May 2018

@disimone is intent greet has the largest amount of training examples?

Ghostvv on 22 May 2018

Hi @Ghostvv ,

here is my configuration

language: "de"

pipeline:
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_spacy"
- name: "ner_crf"
- name: "ner_spacy"
- name: "ner_synonyms"
- name: "ner_duckling"
- name: "intent_featurizer_count_vectors"
#- name: "intent_classifier_sklearn"
- name: "intent_classifier_tensorflow_embedding"
  # nn architecture
  "num_hidden_layers_a": 2
  "hidden_layer_size_a": [256, 128]
  "num_hidden_layers_b": 0
  "hidden_layer_size_b": []
  "batch_size": 32
  "epochs": 300
  # embedding parameters
  "embed_dim": 10
  "mu_pos": 0.8  # should be 0.0 < ... < 1.0 for 'cosine'
  "mu_neg": -0.4  # should be -1.0 < ... < 1.0 for 'cosine'
  "similarity_type": "cosine"  # string 'cosine' or 'inner'
  "num_neg": 10
  "use_max_sim_neg": true  # flag which loss function to use
  # regularization
  "C2": 0.002
  "C_emb": 0.8
  "droprate": 0.2
  # flag if to tokenize intents
  "intent_tokenization_flag": false
  "intent_split_symbol": "_"

disimone on 22 May 2018

@Ghostvv no, greet is not the most represented intent. nor the least represented one.

disimone on 22 May 2018

👍1

@disimone did you check this behavior with "spacy-sklearn" pipeline?

Ghostvv on 22 May 2018

@Ghostvv I just did it. I observe the same behavior as with tensorflow alone, i.e. all unknown strings are assigned the same intent/ranking with the same confidences:

In [5]: interpreter.parse("t")
/home/andrea.disimone/Envs/DSpy3/lib/python3.5/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
Out[5]:
{'entities': [],
 'intent': {'confidence': 0.2484770272894782, 'name': 'greet'},
 'intent_ranking': [{'confidence': 0.2484770272894782, 'name': 'greet'},
  {'confidence': 0.17728574593475305, 'name': 'info_provide'},
  {'confidence': 0.15371417977775265, 'name': 'info_req_general'},

disimone on 22 May 2018

👍1

@disimone unfortunately, I cannot reproduce that different random input provide different result when spacy is present in the pipeline.

Ghostvv on 22 May 2018

@disimone could you please try nlu_embed_update branch?
PR: https://github.com/RasaHQ/rasa_nlu/pull/1092

I added a check to output empty intent with 0 confidence, if the words have never been seen during training

Ghostvv on 22 May 2018

@disimone also, if you want to use standard parameters, you do not need to put all of them into your config file

Ghostvv on 22 May 2018

@Ghostvv thanks for the fix. I am bit time-constrained right now, but I'll do more tests tomorrow morning and let you know.

just for my understanding, your check kicks in only if none of the words have been seen? or is one unkown word enough? what happens with "these are all words we know but RTSDFGDTDASDA isn't"?

thanks for your help,

Andrea.

disimone on 22 May 2018

👍1

@disimone only if _none_ of the words have been seen.

in your example "RTSDFGDTDASDA" will be ignored, e.g. "Hallo"="wqjbfjhwe Hallo ekjrfnj"=intent_greet

Ghostvv on 22 May 2018

@Ghostvv , thanks again for the fix. I confirm that your branch works as expected for me. is it going to be merged to master anytime soon?

Thanks for your help,

Andrea.

disimone on 23 May 2018

@disimone thank you for testing. Does it also work, when spacy is in the pipeline? We're going to merge it soon, though I cannot tell you precise timeline

Ghostvv on 23 May 2018

yes, it works as expected also when both spacy- anche count-vector featurizers are enabled

disimone on 23 May 2018

👍1

@disimone The PR is merged. I'll close the issue for now. Please reopen it if you still experience problems.

Ghostvv on 1 Jun 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Multiple Choice buttons in rasa

nikhilcss97 · 3Comments

2.0.0rc4 clean install: Conflicting dependencies and VisibleDeprecationWarnings

N-Olbert · 3Comments

Could not find a version that satisfies the requirement tensorflow==1.10.0 (from rasa_core)

alonsopg · 3Comments

Regarding Multiple Entity Extraction

rayush7 · 3Comments

Performance Evaluation of a Trained Model?

mit4dev · 4Comments