I use window10 , python2.7
this is my file
test.py
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
chinesebot = ChatBot("Training Example")
chinesebot.set_trainer(ChatterBotCorpusTrainer)
chinesebot.train("chatterbot.corpus.chinese")
chinesebot.get_response("早上好,你好吗?")
then i run python test.py, i get error
F:\AnacondaWork\lib\site-packages\chatterbot\storage\jsonfile.py:30: UnsuitableForProductionWarning: The J
not recommended for production environments.
self.UnsuitableForProductionWarning
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\80920\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data] C:\Users\80920\AppData\Roaming\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data] C:\Users\80920\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data] C:\Users\80920\AppData\Roaming\nltk_data...
[nltk_data] Package vader_lexicon is already up-to-date!
Traceback (most recent call last):
File "test.py", line 9, in <module>
chinesebot.train("chatterbot.corpus.chinese")
File "F:\AnacondaWork\lib\site-packages\chatterbot\trainers.py", line 117, in train
trainer.train(pair)
File "F:\AnacondaWork\lib\site-packages\chatterbot\trainers.py", line 82, in train
statement = self.get_or_create(text)
File "F:\AnacondaWork\lib\site-packages\chatterbot\trainers.py", line 25, in get_or_create
statement = self.storage.find(statement_text)
File "F:\AnacondaWork\lib\site-packages\chatterbot\storage\jsonfile.py", line 46, in find
values = self.database.data(key=statement_text)
File "F:\AnacondaWork\lib\site-packages\jsondb\db.py", line 98, in data
return self._get_content(key)
File "F:\AnacondaWork\lib\site-packages\jsondb\db.py", line 52, in _get_content
obj = self.read_data(self.path)
File "F:\AnacondaWork\lib\site-packages\jsondb\file_writer.py", line 15, in read_data
obj = decode(content)
File "F:\AnacondaWork\lib\site-packages\jsondb\compat.py", line 28, in decode
return json_decode(value, encoding='utf-8')
File "F:\AnacondaWork\lib\json\__init__.py", line 352, in loads
return cls(encoding=encoding, **kw).decode(s)
File "F:\AnacondaWork\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "F:\AnacondaWork\lib\json\decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position 0: invalid continuation byte
how can i clear the console output warning like [ntlk_data] Downloading ....
and how to solve the error.
i can't find the same error in any issue.
thanks for support
@sunchenguang the nltk downloading will search different zip files available on you machine, if any one of the file not found it starts downloading it from server.
I think most of the times in windows can't convert Unicode characters properly, have seen same issue on Linux machine?
REF Link: https://wiki.python.org/moin/PrintFails
@sunchenguang Try to remove database.db and re run your script with below modification it is working fine on my machine.
```Diff
--- a/chatterbot/input/input_adapter.py
+++ b/chatterbot/input/input_adapter.py
@@ -19,14 +19,14 @@ class InputAdapter(Adapter):
Return an existing statement object (if one exists).
"""
input_statement = self.process_input(args, *kwargs)
- self.logger.info('Recieved input statement: {}'.format(input_statement.text))
+ self.logger.info('Recieved input statement: {%r}'.format(input_statement.text))
existing_statement = self.chatbot.storage.find(input_statement.text)
```
I find the file in below
F:\AnacondaWork\Lib\site-packages\chatterbot\input\input_adapter.py
I modify the file as you do, but it doesn't work.
I use the same test.py file and the console just show same error.
did you removed previous database.db file?
yes i do. I would like to try it on python3
@vkosuri this issue also happened when try to train bot with other unicode languages like persian .more details could found in this closed but unsolved bug
The same code ,chinese corpus ,it works in python3.
Maybe python2 has encode decode bug. Just bypass the problem
@vkosuri
@onlydarkknight
@sunchenguang
yeah.... python2 has encode decode issues when using chatterbot. Python3.+ is a good choice...
@sunchenguang i also observed this issue will pop up if we training with exisisting database.db, try to remove database.db and retrain your bot, It may work. Similar issue https://github.com/gunthercox/ChatterBot/issues/567
Most helpful comment
Maybe python2 has encode decode bug. Just bypass the problem
@vkosuri
@onlydarkknight