I have to use multiple Logic adapters as in my case I have tons of queries based on different requests.
But I am not sure how Logic adapters will route or determine which adapters is most appropriate.
I will be using elastic search for data query and there are billions of records. I can not allow NLP to search in all indexes. That will be the waste of resources.
So my question is how to make it more efficient and accurate.
I think this something you are looking http://chatterbot.readthedocs.io/en/stable/logic/response_selection.html
Thanks vkosuri.
This is a great question. I'm not actually sure if there is a good answer
for you right now but I'm hoping that we can figure something out. It might
require making some improvements to ChatterBot itself to make this happen.
Let's see if we can figure something out because I realize this is a real
problem that will only get worse as chat bots build with ChatterBot are
used with larger data sets.
The main problem is that NLP is significantly slower than the set logic
used for querying a database. There are a few approaches that can be used
to make it so that NLP has less to process less data.
Loosening constraints: One option that could help make this process
faster would be to make the assumption that we do not need to search all of
the data to find the best result. It might be possible that an optimal
response could be selected from a subset of the known data.
Queries: it might be possible to generate a query that eliminates
statements that can't possibly be a response to the input the bot received.
Doing this requires those attributes to be identified but this could help
prevent extraneous statements from being returned in the response from the
database.
An example of this could be that the user asks a question. Let's assume
that all statements in the database now have an is_question attribute. We
could eliminate any statement that is not in response to a question.
Categorization of statements by some kind of "type" could help
reduce the data set that has to be processed for large databases if it is
possible to determine the "type" of response that is expected as a response.
However, this does not solve a larger portion of the problem which exists
because of the need to make NLP comparisons between statements. For
example, if we want to find the closest known match to a statement, then we
have to compare that statement to every other statement. (This is where
I'm currently stuck on how to make this process more efficient.)
gunthercox@ - Is it possible to use this kind of ( https://github.com/phpmind/open-intent ) intent entity format like API.AI is using. If we can have that we can get the correct result and narrow down the request and send to the correct query type using routing then different logical adopters can be connected which
could be elastic search, MongoDb or Dynamodb or anything wherever data is stored.
Noting has been written (that I'm aware of) for using the models designed for open-intent.io with ChatterBot. I think the concept of the dictionary file that they have you define might be a good way to handle the "tagging" of statements that I mentioned before.
In their example dictionary file they provide the following example, it looks like they are grouping synonyms for a particular word and then encompassing that inside it's own category.
"greetings": {
"hello": ["hi", "yep", "yo", "hey"]
},
"food_type": {
"pizza": [],
"hamburger": ["big mac", "cheeseburger", "burger"],
"salad": []
}
This is something that could definitely be added to ChatterBot, however I'm going to be adding it then I will need some time to plan what kinds of changes would be required.
Thanks Gunther.
Gunther @ - Apart from open-intent.io, i found this URL this is written in python
https://github.com/MycroftAI/adapt
Not sure how useful this would be but just posting to validate.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
This is a great question. I'm not actually sure if there is a good answer
for you right now but I'm hoping that we can figure something out. It might
require making some improvements to ChatterBot itself to make this happen.
Let's see if we can figure something out because I realize this is a real
problem that will only get worse as chat bots build with ChatterBot are
used with larger data sets.
The main problem is that NLP is significantly slower than the set logic
used for querying a database. There are a few approaches that can be used
to make it so that NLP has less to process less data.
Loosening constraints: One option that could help make this process
faster would be to make the assumption that we do not need to search all of
the data to find the best result. It might be possible that an optimal
response could be selected from a subset of the known data.
Queries: it might be possible to generate a query that eliminates
statements that can't possibly be a response to the input the bot received.
Doing this requires those attributes to be identified but this could help
prevent extraneous statements from being returned in the response from the
database.
An example of this could be that the user asks a question. Let's assume
that all statements in the database now have an
is_questionattribute. Wecould eliminate any statement that is not in response to a question.
Categorization of statements by some kind of "type" could help
reduce the data set that has to be processed for large databases if it is
possible to determine the "type" of response that is expected as a response.
makes it possible to do something to each statement a chat bot receives
as input. Incoming statements could be tagged with NLP scores and values so
that these could be searched on later. This wouldn't solve the problem for
existing statements but a line could be added somewhere to save the NLP
evaluations if they have not been generated for a statement yet. This makes
it so that NLP evaluations only ever have to be run once for a statement.
This may work well when logic adapters depend on things such as the tagged
parts of speech of a statement.
However, this does not solve a larger portion of the problem which exists
because of the need to make NLP comparisons between statements. For
example, if we want to find the closest known match to a statement, then we
have to compare that statement to every other statement. (This is where
I'm currently stuck on how to make this process more efficient.)