Currently, if you want to handle out of scope messages, you need to define a special intent for that and add training samples to distinguish messages that are of one of the intents you are looking for and messages that are just bogus or non relevant for your domain.
We should think about methods and ideas to be able to return something along the lines of
"intent": null , meaning that the current message doesn't belong to any of the trained intents.
when we can expect this enhancement to be added as a part of rasa nlu?
We haven't started working on it and I don't think someone else started working on it yet, so there is no estimate yet.
we are highly dependent on rasa nlu so can i know an estimation when this enhancement can be added so that we can go for the development process.
I don't really get why is that much of an issue? I may be wrong but the way I see it, defining a confidence threshold under which you consider some fallback behavior should do it.
This become a huge issue when we want to use RASA with this rule "classify only if you're certain of what you think". Yet, spam message get sometime better confidence with Mitie classifier than relevant messages. On the other hand, it's very difficult to list all the potential spam message...
I am working on it and would be pleased to help you on this subject if I find any good solution.
That's not really a RASA issue as it's a difficulty inherent to all machine learning classifiers, the confidence value is directly related to the quality of your model so either bring more data or try to find a ML classification algorithm giving better results for your usecase.
Defining a confidence threshold is exactly what LUIS, Wit or API.ai are doing: https://docs.api.ai/docs/machine-learning-settings#ml-classification-threshold
There may be a better solution but it may prove to be very complex, otherwise companies backed by Google, Facebook or Microsoft may have provided it a long time ago.
i am ok if rasa gives me low confidence value for out of scope messages but actually it is giving very high confidence for out of scope query(The confidence is as high as 0.8 at times) .That is why we need this functionality to classify intent as null.
We are not talking about a feature here but about classification quality. So as I said, to improve that you either need to increase the size of your training dataset and check that your intents domains are not overlapping (two overlapping/similar intents would be for instance I am looking for a restaurant -> restaurant_search intentandI am looking for a museum -> museum search intent: try to avoid that as much as possible) or you can also try to build a new component for intent classification relying on better performing ML classification algorithms for your usecase.
Thanks for the suggestion to build my new component. I would love to rather contribute back to RASA as it has been great so far!
With due respect, the whole point about using RASA is to cut down my development time and use a readily available framework so that I can focus on my business requirements.
Can you think as a consumer with an open mind and let me know what would you expect from a framework that claims to provide this functionality. I have already compared this with OpenNLP and the results are not the same as RASA, so I'm not asking for something which is unreasonable in my opinion and more over this has been acknowledged as a future requirement.
The ask is very simple:
We are working on it. Currently we are in the phase where we compare different approaches and their performance with respect to out of scope classification. No timeline yet, though. Depends on the results.
@tmbo out of curiosity how do you define something to be "out of scope" ?
@vibhutibindal another thing you can do is ask user confirmation in your dialog flow, for instance:
- I'm looking for a restaurant.
- You are looking for a restaurant, is that correct ?
If not you can propose the next best intent and so on two or three times then fallback if there is still no match.
If you ever use this approach just pay attention to the fact that this may add some friction for your end user and not all intents work well with that:
- Hi !
- Did you just try to say hello ?
By the way please excuse me for not being able to think as a consumer with an open mind. You know I'm just a poor software engineer talking to computers all day long (they're my only friends) 馃槈
@PHLF - Currently RASA picks up any intent though the message is "out of scope"
Example
Conversation built to provide information on Higher Education
REQUEST Payload
"text": "What do we have for breakfast today?"
RESPONSE
{
"entities": [],
"intent": {
"confidence": 0.7674252665064972,
"name": "higher_education"
},
"intent_ranking": [
{
"confidence": 0.7674252665064972,
"name": "higher_education"
},
{
"confidence": 0.15787478616804707,
"name": "ticket_status"
},
{
"confidence": 0.027504302721879363,
"name": "greet"
},
.
.
.
{
"confidence": 0.007247469584416219,
"name": "create_new_ticket"
}
],
"text": "What do we have for breakfast today?"
}
Now how will you be able to use the information to identify what the user asked?
User: What do we have for breakfast today?
Bot: !!!
It will be unreasonable to go back ask if the user has asked for Higher Education!
As I said:
this may add some friction for your end user
There is no perfect or magic solution for this. Your best bet is to make use of the confidence levels for each intent but as you stated, the confidence level depends a lot on the underlying data and ML models which give you no guarantee about how to handle this situation.
In your conversation example where you provide information for higher education, it appears clearly that the "What do ... ?" weights the most in classifying your sentence as an "higher_education" intent as the dataset for this specific intent must be (disclaimer: this is a wild guess) made of many interrogative utterances. To mitigate this, you may try to extend your dataset with less interrogative form utterances "I'd like to... / I want to..." and use more vocabulary related to the knowledge domain of "higher education".
I'm eager to know what @tmbo will come up with.
@vibhutibindal I'm always asking to see people's training data, but I'd be interested to see yours in this case as well. If you're willing to share it in a gist or via e-mail.
In the case of higher_education do you have any training samples in there like: What do we have for class today? That's the kind of overlap @phlf is talking about.
Also, did you mention which pipeline you are using?
Adding my 2 cents to the original discussion. We're handling fallback intent outside of Rasa in the wrapper that we have written round it. And that's where I prefer it to stay.... until, Rasa has some dialogue management functionality. Then there could be two different fallback levels (stealing how we implement) One below say 50% that asks for a confirmation of intent and one below 10% that asks for restating the question or admits defeat.
Im happy to find this issue and discussion and see that Im not the only one facing this problem :)
I suspect many other Rasa users like me are also people beginning in ML and NLP , for whom some things are not so obvious.
@vibhutibindal , as I understand it, your model has to pick one intent from all the ones you have in your training file, so unless you have an out_of_scope intent in your training file with a lot of out of scope samples the model will pick the wrong intent.
But I even having an out_of_scope set with lots of unrelated stuff I'm not able to get consistent results with the confidence yet..
Maybe there are other variables involved related to intent samples QTY (balance) , that could affect the overall model quality..
I would love to see a "full size" training file on the documentation that generates a "good quality" model we could use to test if the confidence is what we expect to be.
Any progress?
Did the RASA folks ever come to an agreement as to how to handle this issue?
The scikit learn guys are also trying to tackle a related issue, i.e. how to automatically set a threshold. In the end it depends on a cost function that will be different for each user. So I think @PHLF proposal's makes the most sense, i.e. define an (arbitrary) confidence threshold. This is also the way it is being handled in Rasa core now.
@tmbo @amn41 will this ever be handled in NLU or should the Core solution/failing gracefully blog post end this issue?
Let's keep this open, The Rasa Core solution is highly reliant on the probabilities the intent classifier outputs. Therefore, I think it would still be good to think about models that can better classify between in and out of scope. I don't think we are going to add a component that just does the threshold cut off, as that can be easily done by the caller (in the same way we do it in rasa core).
@PHLF was mentioning '_ask user confirmation in your dialog flow_' and @wrathagom mentioned doing this outside Rasa wrapper. I am also thinking about this while have no clue, wondering if you may hint how can I acheive this?
As confirmation fallback questions should be determinstive and shall not be predicted, we won't want that be in our stories. But how to tackle the answer to the fallback questions, as they'probably' should not in the stories either since the question is not there?
I don't see how I can make a determinstive responsive base on the resopnse to confirmation questions, nor how to construct the stories for this. Will appreciate if you may advise.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
That's not really a RASA issue as it's a difficulty inherent to all machine learning classifiers, the confidence value is directly related to the quality of your model so either bring more data or try to find a ML classification algorithm giving better results for your usecase.
Defining a confidence threshold is exactly what LUIS, Wit or API.ai are doing: https://docs.api.ai/docs/machine-learning-settings#ml-classification-threshold
There may be a better solution but it may prove to be very complex, otherwise companies backed by Google, Facebook or Microsoft may have provided it a long time ago.