Hello, I used the Italian data to train the system. My platform is Windows 10 with Python 2.7
With the following example, i have no problems:
from chatterbot import ChatBot
chatbot = ChatBot("BotItaliano")
chatbot.train("chatterbot.corpus.italian")
chatbot.get_response("Ciao, come va?")
But if i use the get_response with these words, in particular the "猫" character
chatbot.get_response("la vita 猫 vita")
i have the error:
C:\Python27\lib\site-packages\chatterbot\adapters\logic\closest_match.py:28: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if input_statement.text in text_of_all_statements:
C:\Python27\lib\site-packages\jsondb\db.py:29: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if key in obj.keys():
Traceback (most recent call last):
File "
File "C:\Python27\lib\site-packages\chatterbot\chatterbot.py", line 80, in get_response
self.storage.update(input_statement)
File "C:\Python27\lib\site-packages\chatterbot\adapters\storage\jsondatabase.py", line 111, in update
self.database.data(key=statement.text, value=data)
File "C:\Python27\lib\site-packages\jsondb\db.py", line 76, in data
self._set_content(key, value)
File "C:\Python27\lib\site-packages\jsondb\db.py", line 40, in _set_content
data = write_data(self.path, obj)
File "C:\Python27\lib\site-packages\jsondb\file_writer.py", line 22, in write_data
db.write(encode(obj))
File "C:\Python27\lib\site-packages\jsondb\compat.py", line 19, in encode
return json_encode(value)
File "C:\Python27\lib\json__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position 8: invalid start byte
If i use Python 3.5 i have not this error, but when i start to have response, i see strange characters
"Cos'脙篓 che vuoi sapere?"
it should be:
"Cos'猫 che vuoi sapere?"
Hi, could you try running the following and let me know if it fixes the issue for you?
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
chatbot = ChatBot("BotItaliano")
chatbot.train("chatterbot.corpus.italian")
chatbot.get_response(u"la vita 猫 vita")
In this case I've made two changes. I've added a unicode header to the main program, and I have added a unicode prefix to the input string containing the unicode characters.
+ #!/usr/bin/env python
+ # -*- coding: utf-8 -*-
from chatterbot import ChatBot
chatbot = ChatBot("BotItaliano")
chatbot.train("chatterbot.corpus.italian")
chatbot.get_response("la vita 猫 vita")
+ chatbot.get_response(u"la vita 猫 vita")
I have put the code in a file "test.py" and no result
C:\Python27>python test.py
File "test.py", line 7
chatbot.get_response(u"la vita 脼 vita")
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe8 in position 0: unexpected end of data
It works! After i adding the header and i have saved the file in UNICODE UTF-8 Format. It is not enough to simply add the header. Very Thanks!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
It works! After i adding the header and i have saved the file in UNICODE UTF-8 Format. It is not enough to simply add the header. Very Thanks!