Rasa: interpreting random sequences of characters

Created on 22 May 2018  路  22Comments  路  Source: RasaHQ/rasa

Rasa NLU version:
commit 3d736227e3439ab745126263e8431e875b1882cd

Operating system (windows, osx, ...):
linux

Issue:
Hallo,

I have trained a model (intent_featurizer_count_vectors+intent_classifier_tensorflow_embedding) on some German sentences.

While testing the reaction to invalid input, I realized that the intepreter apparently "understands" also random sequences of characters. "fgergq" for example, is assigned an intent with fairly high confidence (65%). "RFSSSw" reaches 96%.

"osifhaskdnkauer" is recognised as greet with 96% confidence, and this is particularly interesting since my "greet" training sentences are the ones you would expect (along the lines of "Hallo", "Good *"), and also mostly very short sentences.

The only way I found to actually get back an empty intent with zero confidence is to parse an empty string. Even single-characters never occurring in the training are interpreted: "j" gets intent "bye" with 80% confidence, while to be fair it would be a lot closer to "ja", one of my "affirm" examples.

I have two questions:

1) for my education, what is actually happening here? How does rasa_nlu+tensorflow_embedding evaluate the intent of a single word never seen before?

2) how do I protect a chatbot from such things? Downstream, the Fallback policy in rasa_core uses the intent confidence to decide whether the user input is valid or not, but there is not much I can do against a 95% confidence from rasa_nlu, so I am guessing there is something I need to tune on the NLU side. Any suggestions?

Thanks for your help,

Andrea.

type type

All 22 comments

I should add that I am using the tensorflow_embedding with the default parameters provided in the rasa_nlu docs:

- name: "intent_classifier_tensorflow_embedding"
  # nn architecture
  "num_hidden_layers_a": 2
  "hidden_layer_size_a": [256, 128]
  "num_hidden_layers_b": 0
  "hidden_layer_size_b": []
  "batch_size": 32
  "epochs": 300
  # embedding parameters
  "embed_dim": 10
  "mu_pos": 0.8  # should be 0.0 < ... < 1.0 for 'cosine'
  "mu_neg": -0.4  # should be -1.0 < ... < 1.0 for 'cosine'
  "similarity_type": "cosine"  # string 'cosine' or 'inner'
  "num_neg": 10
  "use_max_sim_neg": true  # flag which loss function to use
  # regularization
  "C2": 0.002
  "C_emb": 0.8
  "droprate": 0.2
  # flag if to tokenize intents
  "intent_tokenization_flag": false
  "intent_split_symbol": "_"

Thanks for noticing this behavior.

Current tensorflow pipeline ignores the word not seen during training. But embedding layers use biases, so empty input should basically output learned biases. I'm afraid not much can be done right now by tuning NLU.

We'll take a look into this problem in more details.

Hi @Ghostvv ,

thanks a lot for your answer. I guess if the new words are ignored, one would expect the results to be the same as the empty string, which is not the case. Unless maybe you are intercepting the empty string before it is run through the embedding? Maybe an early return?

In any case, thanks for looking into this.

Cheers,

Andrea.

@disimone, yes, empty string get intercepted by interpreter.

Could you please add a bit more details:

  • which featurizer do you use?
  • do you get the same intent for any random input for the same training run?

Hi @Ghostvv ,

I am using "intent_featurizer_count_vectors" and "intent_entity_featurizer_regex". The regex is only catching the "Good *" greetings.

I noticed only now that "intent_featurizer_spacy" is also still activated in my config.yml, a leftover from the default spacy pipeline, but I am not sure if this is causing the problem.

The random strings are assigned different intents. For example "j" is "bye", "s" is "negate", "w" is "provide_info".

I can't really share the trained model, nor the training data, due to data protection issues, but I am happy to run more tests, if you need furhter info.

Thanks for your help,

Andrea.

@disimone thank you for details. Could you please run only "intent_featurizer_count_vectors" without spacy? Because, if you have spacy in the pipeline, "intent_featurizer_count_vectors" uses lemma_ form spacy, so it might provide some lemmatization for single letters, but i'm not sure in this case.

It would be helpful, if you could paste your nlu pipeline config file config.yml

@Ghostvv thanks for the suggestion. I turned off both spacy and regex featurizers, just to be sure.
so now I only have the count_vectors in.

Unfortunately I still observe the same behavior as before, but with an interesting variation: now all random strings are assigned the same intent ("greet") with exactly the same confidence:

In [9]: interpreter.parse("dtgsdhfjhfdhj")
Out[9]:
{'entities': [],
 'intent': {'confidence': 0.6623044013977051, 'name': 'greet'},
 'intent_ranking': [{'confidence': 0.6623044013977051, 'name': 'greet'},
  {'confidence': 0.5363500118255615, 'name': 'info_provide'},
  {'confidence': 0.15859365463256836, 'name': 'bye'},
  {'confidence': 0.14967912435531616, 'name': 'thankyou'},
  {'confidence': -0.09815622866153717, 'name': 'confirm'},
  {'confidence': -0.1715802550315857, 'name': 'negate'},

Before, different strings got different intents with different confidences. Now different random strings get the same main intent, confidence, and ranking.

I admit I am not sure what to make of the negative confidences, though.

HTH,

Andrea.

@disimone thanks for checking, that's exactly what I wanted to check.

Negative confidence means that cosine similarity is negative, because this algorithm is not a classifier that assign probabilities, but a ranker that measure the similarity between input and label embeddings.

Could you please provide your previous (with spacy) full config.yml?

@disimone is intent greet has the largest amount of training examples?

Hi @Ghostvv ,

here is my configuration

language: "de"

pipeline:
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_spacy"
- name: "ner_crf"
- name: "ner_spacy"
- name: "ner_synonyms"
- name: "ner_duckling"
- name: "intent_featurizer_count_vectors"
#- name: "intent_classifier_sklearn"
- name: "intent_classifier_tensorflow_embedding"
  # nn architecture
  "num_hidden_layers_a": 2
  "hidden_layer_size_a": [256, 128]
  "num_hidden_layers_b": 0
  "hidden_layer_size_b": []
  "batch_size": 32
  "epochs": 300
  # embedding parameters
  "embed_dim": 10
  "mu_pos": 0.8  # should be 0.0 < ... < 1.0 for 'cosine'
  "mu_neg": -0.4  # should be -1.0 < ... < 1.0 for 'cosine'
  "similarity_type": "cosine"  # string 'cosine' or 'inner'
  "num_neg": 10
  "use_max_sim_neg": true  # flag which loss function to use
  # regularization
  "C2": 0.002
  "C_emb": 0.8
  "droprate": 0.2
  # flag if to tokenize intents
  "intent_tokenization_flag": false
  "intent_split_symbol": "_"

@Ghostvv no, greet is not the most represented intent. nor the least represented one.

@disimone did you check this behavior with "spacy-sklearn" pipeline?

@Ghostvv I just did it. I observe the same behavior as with tensorflow alone, i.e. all unknown strings are assigned the same intent/ranking with the same confidences:

In [5]: interpreter.parse("t")
/home/andrea.disimone/Envs/DSpy3/lib/python3.5/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
Out[5]:
{'entities': [],
 'intent': {'confidence': 0.2484770272894782, 'name': 'greet'},
 'intent_ranking': [{'confidence': 0.2484770272894782, 'name': 'greet'},
  {'confidence': 0.17728574593475305, 'name': 'info_provide'},
  {'confidence': 0.15371417977775265, 'name': 'info_req_general'},

@disimone unfortunately, I cannot reproduce that different random input provide different result when spacy is present in the pipeline.

@disimone could you please try nlu_embed_update branch?
PR: https://github.com/RasaHQ/rasa_nlu/pull/1092

I added a check to output empty intent with 0 confidence, if the words have never been seen during training

@disimone also, if you want to use standard parameters, you do not need to put all of them into your config file

@Ghostvv thanks for the fix. I am bit time-constrained right now, but I'll do more tests tomorrow morning and let you know.

just for my understanding, your check kicks in only if none of the words have been seen? or is one unkown word enough? what happens with "these are all words we know but RTSDFGDTDASDA isn't"?

thanks for your help,

Andrea.

@disimone only if _none_ of the words have been seen.

in your example "RTSDFGDTDASDA" will be ignored, e.g. "Hallo"="wqjbfjhwe Hallo ekjrfnj"=intent_greet

@Ghostvv , thanks again for the fix. I confirm that your branch works as expected for me. is it going to be merged to master anytime soon?

Thanks for your help,

Andrea.

@disimone thank you for testing. Does it also work, when spacy is in the pipeline? We're going to merge it soon, though I cannot tell you precise timeline

yes, it works as expected also when both spacy- anche count-vector featurizers are enabled

@disimone The PR is merged. I'll close the issue for now. Please reopen it if you still experience problems.

Was this page helpful?
0 / 5 - 0 ratings