Redash: TreasureData connector could not include non-ascii code query in some case.

Created on 1 Sep 2017  路  6Comments  路  Source: getredash/redash

Issue Summary

Hi, I'm trying to query Treasure Data including NON-ASCII characters. But it could not execute and another data storage works fine.(e.f. MySQL ...)

Steps to Reproduce

  1. Using TreasureData and execute the query in below.
  • NG case
    time AS "鏃ヤ粯" -- non-ascii character.
from
    raw
limit 1
  • OK case
    time AS "datetime" 
from
    raw
limit 1
  1. An error has occurred and could not show query.
Error running query: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)

Technical details:

  • Redash Version: Redash 2.0.0+b2990
  • Browser/OS: Chrome
  • How did you install Redash: wget and using ubuntu.

Most helpful comment

I'm not familiar with Python's encoding issue.
But, as a possible workaround is to create sitecustomize.py in python2.7/site-packages/sitecustomize.py.

import sys
sys.setdefaultencoding('UTF8')

I found one blog which says similar issue. (Japanese) > https://www.yamamanx.com/redash-encode-error/

I could get the correct result. But, I'm not sure this works well for redash.

```

columns_data = [(row[0], cursor.show_job()['hive_result_schema'][i][1]) for i,row in enumerate(cursor.description)]

print(columns_data)
[(u'\u65e5\u4ed8', u'integer')]
````

All 6 comments

Thank you for the report!

I confirmed that td-client-python 0.8.0 with python 3.6.0 pulls data with the ascii charcter in column name correctly.
Could you show me all error log at redash?

Python 3.6.0 (default, Sep  1 2017, 17:40:45)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import tdclient
>>> query_ng = 'select 1 AS "鏃ヤ粯"'
>>> connection = tdclient.connect(endpoint='https://api.treasuredata.com',apikey='xxxxxxx',type='presto',db='sample_datasets')
>>> cursor = connection.cursor()
>>> cursor.execute(query_ng)
'170260382'
>>> columns_data = [(row[0], cursor.show_job()['hive_result_schema'][i][1]) for i,row in enumerate(cursor.description)]
>>> print(columns_data)
[('鏃ヤ粯', 'integer')]
>>>

This code comes from https://github.com/getredash/redash/blob/master/redash/query_runner/treasuredata.py#L97

I noticed that
Bootstrap for Ubuntu uses apt-get install python-dev.
https://github.com/getredash/redash/blob/master/setup/ubuntu/bootstrap.sh#L42

And, the default python-dev install python 2.7.
https://packages.ubuntu.com/xenial/python/python-dev

Then, I could reproduce this issue.

Python 2.7.11 (default, Sep  1 2017, 18:29:47)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import tdclient
>>>
>>> query_ng = 'select 1 AS "鏃ヤ粯"'
>>> connection = tdclient.connect(endpoint='https://api.treasuredata.com',apikey='xxxx',type='presto',db='sample_datasets')
>>>
>>> cursor = connection.cursor()
>>>
>>> cursor.execute(query_ng)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/cursor.py", line 50, in execute
    self._do_execute()
  File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/cursor.py", line 77, in _do_execute
    return self._do_execute()
  File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/cursor.py", line 68, in _do_execute
    job = self._api.show_job(self._executed)
  File "/Users/takahashi/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tdclient/job_api.py", line 106, in show_job
    if js.get("hive_result_schema") is not None and 0 < len(str(js["hive_result_schema"])):
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)

I'll dig into the cause later.

@toru-takahashi Thank your reproducing report. I will check my environments soon.

I'm not familiar with Python's encoding issue.
But, as a possible workaround is to create sitecustomize.py in python2.7/site-packages/sitecustomize.py.

import sys
sys.setdefaultencoding('UTF8')

I found one blog which says similar issue. (Japanese) > https://www.yamamanx.com/redash-encode-error/

I could get the correct result. But, I'm not sure this works well for redash.

```

columns_data = [(row[0], cursor.show_job()['hive_result_schema'][i][1]) for i,row in enumerate(cursor.description)]

print(columns_data)
[(u'\u65e5\u4ed8', u'integer')]
````

@toru-takahashi
Fix the problem to try to modify python2.7/site-packages/sitecustomize.py.
This problem is Python2.7's default encoding issue.

Thank you for your great support.

Good to hear!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

susodapop picture susodapop  路  3Comments

WesleyBatista picture WesleyBatista  路  4Comments

arikfr picture arikfr  路  3Comments

alison985 picture alison985  路  3Comments

koooge picture koooge  路  3Comments