Chatterbot: UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position 8: invalid start byte

Created on 12 Mar 2016 · 5Comments · Source: gunthercox/ChatterBot

Hello, I used the Italian data to train the system. My platform is Windows 10 with Python 2.7
With the following example, i have no problems:

from chatterbot import ChatBot
chatbot = ChatBot("BotItaliano")
chatbot.train("chatterbot.corpus.italian")
chatbot.get_response("Ciao, come va?")

But if i use the get_response with these words, in particular the "è" character

chatbot.get_response("la vita è vita")

i have the error:

C:\Python27\lib\site-packages\chatterbot\adapters\logic\closest_match.py:28: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if input_statement.text in text_of_all_statements:
C:\Python27\lib\site-packages\jsondb\db.py:29: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if key in obj.keys():
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\chatterbot\chatterbot.py", line 80, in get_response
self.storage.update(input_statement)
File "C:\Python27\lib\site-packages\chatterbot\adapters\storage\jsondatabase.py", line 111, in update
self.database.data(key=statement.text, value=data)
File "C:\Python27\lib\site-packages\jsondb\db.py", line 76, in data
self._set_content(key, value)
File "C:\Python27\lib\site-packages\jsondb\db.py", line 40, in _set_content
data = write_data(self.path, obj)
File "C:\Python27\lib\site-packages\jsondb\file_writer.py", line 22, in write_data
db.write(encode(obj))
File "C:\Python27\lib\site-packages\jsondb\compat.py", line 19, in encode
return json_encode(value)
File "C:\Python27\lib\json__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position 8: invalid start byte

Source

orfeomorello

Most helpful comment

It works! After i adding the header and i have saved the file in UNICODE UTF-8 Format. It is not enough to simply add the header. Very Thanks!

orfeomorello on 12 Mar 2016

👍4

All 5 comments

If i use Python 3.5 i have not this error, but when i start to have response, i see strange characters

"Cos'Ã¨ che vuoi sapere?"

it should be:

"Cos'è che vuoi sapere?"

orfeomorello on 12 Mar 2016

Hi, could you try running the following and let me know if it fixes the issue for you?

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from chatterbot import ChatBot
chatbot = ChatBot("BotItaliano")
chatbot.train("chatterbot.corpus.italian")
chatbot.get_response(u"la vita è vita")

In this case I've made two changes. I've added a unicode header to the main program, and I have added a unicode prefix to the input string containing the unicode characters.

+ #!/usr/bin/env python
+ # -*- coding: utf-8 -*-

from chatterbot import ChatBot
    chatbot = ChatBot("BotItaliano")
    chatbot.train("chatterbot.corpus.italian")  
    chatbot.get_response("la vita è vita")
+    chatbot.get_response(u"la vita è vita")

gunthercox on 12 Mar 2016

I have put the code in a file "test.py" and no result

C:\Python27>python test.py
  File "test.py", line 7
    chatbot.get_response(u"la vita Þ vita")
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe8 in position 0: unexpected end of data

orfeomorello on 12 Mar 2016

It works! After i adding the header and i have saved the file in UNICODE UTF-8 Format. It is not enough to simply add the header. Very Thanks!

orfeomorello on 12 Mar 2016

👍4

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.