Hi, I'm trying to query Treasure Data including NON-ASCII characters. But it could not execute and another data storage works fine.(e.f. MySQL ...)
time AS "鏃ヤ粯" -- non-ascii character.
from
raw
limit 1
time AS "datetime"
from
raw
limit 1
Error running query: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
Thank you for the report!
I confirmed that td-client-python 0.8.0 with python 3.6.0 pulls data with the ascii charcter in column name correctly.
Could you show me all error log at redash?
Python 3.6.0 (default, Sep 1 2017, 17:40:45)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import tdclient
>>> query_ng = 'select 1 AS "鏃ヤ粯"'
>>> connection = tdclient.connect(endpoint='https://api.treasuredata.com',apikey='xxxxxxx',type='presto',db='sample_datasets')
>>> cursor = connection.cursor()
>>> cursor.execute(query_ng)
'170260382'
>>> columns_data = [(row[0], cursor.show_job()['hive_result_schema'][i][1]) for i,row in enumerate(cursor.description)]
>>> print(columns_data)
[('鏃ヤ粯', 'integer')]
>>>
This code comes from https://github.com/getredash/redash/blob/master/redash/query_runner/treasuredata.py#L97
I noticed that
Bootstrap for Ubuntu uses apt-get install python-dev.
https://github.com/getredash/redash/blob/master/setup/ubuntu/bootstrap.sh#L42
And, the default python-dev install python 2.7.
https://packages.ubuntu.com/xenial/python/python-dev
Then, I could reproduce this issue.
Python 2.7.11 (default, Sep 1 2017, 18:29:47)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import tdclient
>>>
>>> query_ng = 'select 1 AS "鏃ヤ粯"'
>>> connection = tdclient.connect(endpoint='https://api.treasuredata.com',apikey='xxxx',type='presto',db='sample_datasets')
>>>
>>> cursor = connection.cursor()
>>>
>>> cursor.execute(query_ng)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/cursor.py", line 50, in execute
self._do_execute()
File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/cursor.py", line 77, in _do_execute
return self._do_execute()
File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/cursor.py", line 68, in _do_execute
job = self._api.show_job(self._executed)
File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/job_api.py", line 106, in show_job
if js.get("hive_result_schema") is not None and 0 < len(str(js["hive_result_schema"])):
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
I'll dig into the cause later.
@toru-takahashi Thank your reproducing report. I will check my environments soon.
I'm not familiar with Python's encoding issue.
But, as a possible workaround is to create sitecustomize.py in python2.7/site-packages/sitecustomize.py.
import sys
sys.setdefaultencoding('UTF8')
I found one blog which says similar issue. (Japanese) > https://www.yamamanx.com/redash-encode-error/
I could get the correct result. But, I'm not sure this works well for redash.
```
columns_data = [(row[0], cursor.show_job()['hive_result_schema'][i][1]) for i,row in enumerate(cursor.description)]
print(columns_data)
[(u'\u65e5\u4ed8', u'integer')]
````
@toru-takahashi
Fix the problem to try to modify python2.7/site-packages/sitecustomize.py.
This problem is Python2.7's default encoding issue.
Thank you for your great support.
Good to hear!
Most helpful comment
I'm not familiar with Python's encoding issue.
But, as a possible workaround is to create sitecustomize.py in
python2.7/site-packages/sitecustomize.py.I found one blog which says similar issue. (Japanese) > https://www.yamamanx.com/redash-encode-error/
I could get the correct result. But, I'm not sure this works well for redash.
```