Chatterbot: Bad performance on big amount of training data

Created on 9 May 2016 · 17Comments · Source: gunthercox/ChatterBot

I trained the bot with ~17k conversations and now it takes a lot of time for response. Are there ways to avoid it?
Training data: https://gist.github.com/sntp/221f53c48bec929ac36d0951b496fcbd

bug

Source

sntp

Most helpful comment

Oh, okay, mongodb works fine now. Much faster. But that Bulk error is annoying.

And I still think having a standard option of sqlite would be nice, it's much faster than json, but it is also just a single file and it does not require you to install anything but python. Just a thought. No rush though.

Nixellion on 20 Aug 2016

🎉1 👍1

All 17 comments

Commit https://github.com/gunthercox/ChatterBot/commit/e5a986968a0844549283d14ae77d60a98d2987a1 makes one small change to start to address this by reducing the number of read and write transactions that are made to the database. I will continue to post updates on this ticket to track performance improvement changes.

gunthercox on 9 May 2016

Pull request https://github.com/gunthercox/ChatterBot/pull/173 allows the storage adapter to override an expensive method to provide a more efficient implementation. The get_response_statements method has been overridden on the MongoDB storage adapter to provide a much more efficient version that should yield a significant improvement in performance.

gunthercox on 5 Jun 2016

What about using SQLite? Will that speed up the process? Is there sqlite adapter?

I tried to do the same, I fed a 3.5Mb training file with converstations from social network, was curious what kind of answers i'll get from that :D

And firstly it took about 40 minutes to train, and now it is just stuck on trying to answer.
I tried using MongoDB but got an error pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it

I gueees, because i need to download it and run the server, huh?..

Nixellion on 19 Aug 2016

👍1

Btw, Why not to use SQL database?

sntp on 20 Aug 2016

Oh, okay, mongodb works fine now. Much faster. But that Bulk error is annoying.

Nixellion on 20 Aug 2016

🎉1 👍1

@Nixellion I'm glad you are getting better results with the Mongo DB adapter. The JSON file adapter is really just meant for testing and development because it is limited by the fact that it has to write to the hard disk each time it needs to save.

Sill looking into the bulk insert error, and I've opened a ticket for tracking the addition of a new SQLite storage adapter #241.

gunthercox on 20 Aug 2016

Cool, thanks!

Also found some old discussions back from 2014-2015, about making this bot smart enough to pass at least some of Turing tests\questions, building sentences from words, etc I hope you're still onto it :)

Nixellion on 20 Aug 2016

Does It support parallel Training？

chenjun0210 on 20 Oct 2016

Parallel training is only supported if the database being used supports concurrent writes. The default file database that ChatterBot uses does not support concurrent writes, but if you use mongo db it will.

gunthercox on 20 Oct 2016

my data size about
i use mongo db . but i dont know how to set the training parameters or when i use mongodb the default is parallel training ？？ thanks a lot

chenjun0210 on 20 Oct 2016

my data size is about 2G

chenjun0210 on 20 Oct 2016

You will probably need to do a bit of work to get the import process ready to bring in 2GB of data in parallel. I would recommend breaking it up, if possible, into a few files of manageable size. You will then have to use python's multiprocessing capabilities to start training processes on each subset of the data file. This functionality isn't built into ChatterBot at the moment, if you are unsure on how to accomplish this, feel free to ask any questions. Otherwise, I have opened a ticket to get support for this functionality added to ChatterBot (https://github.com/gunthercox/ChatterBot/issues/354).

gunthercox on 21 Oct 2016

I've noticed that #597 using ujson has sped up processing a lot, though my training data is only ~300MB in size. I recommend trying it out to see how much faster it will go.

MaryWeeb on 17 Jan 2017

@martmists hi, bro, have u solved the efficiency of bot's training and testing ?can u share some thoughts about improving efficiency ? tks

jxfruit on 9 Oct 2017

One thing to note is to NOT use the default JSON storage. It's slow due to constant I/O, it's relatively unoptimized and uses the stdlib JSON module. I recommend writing your own or trying to find one online.

MaryWeeb on 10 Oct 2017

👍1

@martmists I have used mongodb as the storage adapter. However it is still very slowly for response about 7w data taking 41 seconds. I am working on finding other ways to improving efficiency. How about u?

jxfruit on 10 Oct 2017

I'm going to close this issue off, I don't believe there is any remaining actionable items here. Tickets have been created to implement changes that will help to improve response times. See #925 and its related tickets for further details.

gunthercox on 2 Dec 2017

Was this page helpful?

0 / 5 - 0 ratings