Incubator-superset: seems fail with chinese utf8

Created on 25 Mar 2017  ·  24Comments  ·  Source: apache/incubator-superset

when use sql lab:

if a table contains Chinese utf8 character:

it fails:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb7 in position 1: invalid start byte

Most helpful comment

Hi @jinleileiking. I had the same problem and finally found out a workaround. I guess you can save your configuration with including "?charset=utf8" and ignore the test connection fail. Once you saved the configuration with "?charset=utf8", the test connection will say "seems ok". It might be a bug on the test connection.

All 24 comments

| user  | CREATE TABLE `user` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '自增主键',
  `openid` varchar(256) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '微信openid',
  `mobile` bigint(20) NOT NULL DEFAULT '0' COMMENT '手机号',
  `password` varchar(256) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '密码',
  `avatar_link` varchar(512) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '用户头像',
  `nickname` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '用户昵称',
  `ip` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '用户ip',
  `gender` tinyint(4) NOT NULL DEFAULT '0' COMMENT '性别0未填写1男2女',
  `rc_token` varchar(512) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '融云的token',
  `room_thumbnail_link` varchar(512) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '房间缩略图',
  `room_name` varchar(32) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '房间名',
  `create_time` int(11) NOT NULL DEFAULT '0' COMMENT '创建用户时间',
  `update_time` int(11) NOT NULL DEFAULT '0' COMMENT '更新用户信息时间',
  `status` tinyint(4) NOT NULL DEFAULT '0' COMMENT '用户状态0注册成功1未注册成功',
  `session_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '手机注册用户的sessionId',
  PRIMARY KEY (`id`),
  KEY `user_mobile_index` (`mobile`)
) ENGINE=InnoDB AUTO_INCREMENT=212 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
mysql> select * from user limit 1\G
*************************** 1. row ***************************
                 id: 1
             openid: oUffWw4max8mP-FM2gLsDId3yBos
             mobile: 0
           password:
        avatar_link: http://wx.qlogo.cn/mmopen/Q3auHgzwzM6G8U77Zg3z97CF4DgVVzAA4BxgEESPPiauKG6DIBuRlRMic2EBjFddvk6FVV9kiczdVBjCfWdTibaVmjvhwibboMDk8LPxsv3Z2MIs/0
           nickname: 青梅煮马
                 ip: 114.255.44.139
             gender: 1
room_thumbnail_link:
          room_name: 嗯
        create_time: 0
        update_time: 0
             status: 0
         session_id:
➜  openssl git:(master) ✗ locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

@jinleileiking you have to post the full stackstrace, anyway if you can give it a try to python3 as it should work just fine

@xrmx I update to python3.5

problem remains.

2017-03-25 21:51:08,824:INFO:root:Running query:                                                                                                         [6/6]
SELECT id,
       openid,
       mobile,
       password,
       avatar_link,
       nickname,
       ip,
       gender,
       rc_token,
       room_thumbnail_link,
       room_name,
       create_time,
       update_time,
       status,
       session_id
FROM nemo.user LIMIT 100
2017-03-25 21:51:08,836:INFO:root:Handling cursor
2017-03-25 21:51:08,839:INFO:root:Fetching data: {u'changedOn': datetime.datetime(2017, 3, 25, 13, 51, 8, 835538), u'tempTable': u'', u'userId': 2, u'execute$Sql': u'SELECT id,\n       openid,\n       mobile,\n       password,\n       avatar_link,\n       nickname,\n       ip,\n       gender,\n       rc_token,\n
    room_thumbnail_link,\n       room_name,\n       create_time,\n       update_time,\n       status,\n       session_id\nFROM nemo.user LIMIT 100', u'ctas':
False, u'tab': u'Untitled Query 3', u'sqlEditorId': u'BkZKGAz3g', u'id': u'ry5ASlN3x', u'changed_on': '2017-03-25T13:51:08.835538', u'rows': None, u'endDttm'$ None, u'limit_reached': False, u'state': u'running', u'progress': 0, u'schema': u'nemo', u'dbId': 5, u'errorMessage': None, u'db': u'nemo ol', u'user': u'ro$t', u'sql': u'SELECT id,\n       openid,\n       mobile,\n       password,\n       avatar_link,\n       nickname,\n       ip,\n       gender,\n       rc_toke$,\n       room_thumbnail_link,\n       room_name,\n       create_time,\n       update_time,\n       status,\n       session_id\nFROM nemo.user LIMIT 100', u'$tartDttm': Decimal('1490449868789.4941406250'), u'limit': 1000000, u'serverId': 39, u'resultsKey': None}
2017-03-25 21:51:09,499:ERROR:root:'utf8' codec can't decode byte 0xb7 in position 1: invalid start byte
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/superset/views/core.py", line 2043, in sql_json
    data = sql_lab.get_sql_results(query_id, return_results=True)
  File "/usr/local/lib/python2.7/site-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/local/lib/python2.7/site-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/superset/sql_lab.py", line 166, in get_sql_results
    payload = json.dumps(payload, default=utils.json_iso_dttm_ser)
  File "/usr/local/lib/python2.7/json/__init__.py", line 250, in dumps

  sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/local/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb7 in position 1: invalid start byte

when push the run query button

need other infos ?

Ah i overlooked that it's utf-8 decoder that's exploding. Could it be that your stuff in mysql that's not properly encoded as utf-8?

Removing a few unneeded comments to keep things tidy.

the data in mysql db is correct, used by php laravel , worked ok.

You mean I delete the COMMENT in create table sql command?

superset ver 0.17.1

@jinleileiking I'm removing some of your comments because they just make the issue hard to follow. Please no need to add other data.

@xrmx Thats ok, thanks

@jinleileiking did you set charset in your SQLAlchemy URI, like : mysql://root:[email protected]/test?charset=utf8

@App1905 I tried to add ?charsert=utf8 at the end of SQLAlchemy URI , It seems the connection test failed.

mysql://root:[email protected]/nxxo?charset=utf8 test connection fail
mysql://root:[email protected]/nxxo test connection ok

what's you mysql version? We've been using ?charset=utf8 on all of our mysql connections

@mistercrunch | Server version: | 5.5.34 MySQL Community Server (GPL) |

Hi @jinleileiking. I had the same problem and finally found out a workaround. I guess you can save your configuration with including "?charset=utf8" and ignore the test connection fail. Once you saved the configuration with "?charset=utf8", the test connection will say "seems ok". It might be a bug on the test connection.

@smly worked for me!!!! bravo!!! MAY BE A BUGGIE 👍

that's amazing, it worked for me

@smly It worked for me!you're right,there seems be a problem on the test connection. I clicked the test connection button at first and it showed an error, then I click save button and successed, and next I clicked the test connection button again and it showed "OK".

Notice: this issue has been closed because it has been inactive for 320 days. Feel free to comment and request for this issue to be reopened.

@smly u r the best answer ive ever seen

Was this page helpful?
0 / 5 - 0 ratings