Chatterbot: Training chatterbot with corpus

Created on 30 Apr 2019  路  4Comments  路  Source: gunthercox/ChatterBot

Hello.
I would like to train chatbot using my custom data. With over 1 million lines of conversation, I do not want to use list trainer and "hardcode" all those conversations into the bot. But it seems, that the list trainer is the only one allowing you to have longer conversations than 2 lines.

From the documentation example:

trainer.train([
    "How are you?",
    "I am good.",
    "That is good to hear.",
    "Thank you",
    "You are welcome.",
])

Here each item in the list as a possible response to it鈥檚 predecessor in the list. In the corpus data, I do not know how to make a longer conversations like this.
How would this be possible without hard coding million lines of conversation into the bot?

Most helpful comment

You could load the file into an array, then pass that like:
trainer.train(data)

All 4 comments

You could load the file into an array, then pass that like:
trainer.train(data)

You could load the file into an array, then pass that like:
trainer.train(data)

Then how about when I have a statement and 30 different reply's to that statement. I know how to do that with corpus file

- - You are arrogant
  - Arrogance is not one of my emotions.
  - I have no real emotions, so how can I be arrogant?
  - I am terse.  There is a difference.
  - I am not human, so how can I partake of a human emotion such as arrogance?
- - You are bragging
  - I don't have a big ego.
  - I'm not bragging, I'm only answering your questions.
  - I am not human, so how can I express a human emotion such as braggadaccio?
  - I'm not bragging, I'm just that awesome.
  - I'm sorry, I can't hear you over the sound of how awesome I am.

I don't know how to do this with the array. Basically I will have a mixed structure like this:

 - - You are arrogant
  - Arrogance is not one of my emotions.
  - I have no real emotions, so how can I be arrogant?
  - I am terse.  There is a difference.
  - I am not human, so how can I partake of a human emotion such as arrogance?
*
  - - How are you?
  - I am good.
  - - That is good to hear.
  - Thank you
  - - You are welcome.
 *
  - - How are you?
  - I am pretty good.
  - - Awesome.
  - How about you?
  - - Could be better...
  - - Ah, could be better...

As you can see here, there can be multiple conversations that start the same way, or multiple different reply's to 1 statement.

Hello @dtoxodm ,

I see. My suggestion applies only when you have "real" conversation data.
I think for this it would be better to sort the training data into 2 group and place them in a separate folder.
The continous conversation files can be trained using the listtrainer.

from chatterbot.trainers import ListTrainer
trainer = ListTrainer(chatbot)
...load the file as an array...
trainer.train(dataArray)

And the ones where you have multiple possible reply can be consumed by the standard corpus trainer.

from chatbot import chatbot
from chatterbot.trainers import ChatterBotCorpusTrainer
trainer = ChatterBotCorpusTrainer(chatbot)
trainer.train(
    "./data/corpusfolder"
)

I'm not sure if this is the proper way of doing it, but this would be my aproach.

Cheers,
/Gabor

@dtoxodm Would you consider publishing your data? Im looking for the exact thing you described but there is barley any chatterbot corpus data!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

juanpialbano picture juanpialbano  路  4Comments

AfrahAsif picture AfrahAsif  路  3Comments

gunthercox picture gunthercox  路  3Comments

AmusingThrone picture AmusingThrone  路  3Comments

hemangsk picture hemangsk  路  4Comments