I trained the bot with ~17k conversations and now it takes a lot of time for response. Are there ways to avoid it?
Training data: https://gist.github.com/sntp/221f53c48bec929ac36d0951b496fcbd
Commit https://github.com/gunthercox/ChatterBot/commit/e5a986968a0844549283d14ae77d60a98d2987a1 makes one small change to start to address this by reducing the number of read and write transactions that are made to the database. I will continue to post updates on this ticket to track performance improvement changes.
Pull request https://github.com/gunthercox/ChatterBot/pull/173 allows the storage adapter to override an expensive method to provide a more efficient implementation. The get_response_statements method has been overridden on the MongoDB storage adapter to provide a much more efficient version that should yield a significant improvement in performance.
What about using SQLite? Will that speed up the process? Is there sqlite adapter?
I tried to do the same, I fed a 3.5Mb training file with converstations from social network, was curious what kind of answers i'll get from that :D
And firstly it took about 40 minutes to train, and now it is just stuck on trying to answer.
I tried using MongoDB but got an error pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it
I gueees, because i need to download it and run the server, huh?..
Btw, Why not to use SQL database?
Oh, okay, mongodb works fine now. Much faster. But that Bulk error is annoying.
And I still think having a standard option of sqlite would be nice, it's much faster than json, but it is also just a single file and it does not require you to install anything but python. Just a thought. No rush though.
@Nixellion I'm glad you are getting better results with the Mongo DB adapter. The JSON file adapter is really just meant for testing and development because it is limited by the fact that it has to write to the hard disk each time it needs to save.
Sill looking into the bulk insert error, and I've opened a ticket for tracking the addition of a new SQLite storage adapter #241.
Cool, thanks!
Also found some old discussions back from 2014-2015, about making this bot smart enough to pass at least some of Turing tests\questions, building sentences from words, etc I hope you're still onto it :)
Does It support parallel Training?
Parallel training is only supported if the database being used supports concurrent writes. The default file database that ChatterBot uses does not support concurrent writes, but if you use mongo db it will.
my data size about
i use mongo db . but i dont know how to set the training parameters or when i use mongodb the default is parallel training ?? thanks a lot
my data size is about 2G
You will probably need to do a bit of work to get the import process ready to bring in 2GB of data in parallel. I would recommend breaking it up, if possible, into a few files of manageable size. You will then have to use python's multiprocessing capabilities to start training processes on each subset of the data file. This functionality isn't built into ChatterBot at the moment, if you are unsure on how to accomplish this, feel free to ask any questions. Otherwise, I have opened a ticket to get support for this functionality added to ChatterBot (https://github.com/gunthercox/ChatterBot/issues/354).
I've noticed that #597 using ujson has sped up processing a lot, though my training data is only ~300MB in size. I recommend trying it out to see how much faster it will go.
@martmists hi, bro, have u solved the efficiency of bot's training and testing ?can u share some thoughts about improving efficiency ? tks
One thing to note is to NOT use the default JSON storage. It's slow due to constant I/O, it's relatively unoptimized and uses the stdlib JSON module. I recommend writing your own or trying to find one online.
@martmists I have used mongodb as the storage adapter. However it is still very slowly for response about 7w data taking 41 seconds. I am working on finding other ways to improving efficiency. How about u?
I'm going to close this issue off, I don't believe there is any remaining actionable items here. Tickets have been created to implement changes that will help to improve response times. See #925 and its related tickets for further details.
Most helpful comment
Oh, okay, mongodb works fine now. Much faster. But that Bulk error is annoying.
And I still think having a standard option of sqlite would be nice, it's much faster than json, but it is also just a single file and it does not require you to install anything but python. Just a thought. No rush though.