Incubator-superset: Word Clouds do not support Unicode characters

Created on 6 Apr 2016  路  6Comments  路  Source: apache/incubator-superset

When we create a word cloud from a mysql source we get just ????????? for any non English language data.

We're using Khmer (Cambodian), but I suspect that this would hold true for other unicode languages.

#bug help-wanted

Most helpful comment

@lkozloff

You use MySQL as database?
If so, set up SQLAlchemy URI as below.

mysql://username:[email protected]:3306/mysql_database_name?charset=utf8

The suffix ?charset=utf8 is important. I confirmed correct characters (Japanese) showed by this descripition :-)

All 6 comments

Any clue what layer is choking on the unicode? If you have a chance, can you test in python3 and report wether the bug exists there as well?

I got it going in Python3 and I can confirm that the Japanese and Korean unicode text does render as expected... so it seems to be an issue in the python layer. Yay progress!

That said, there is another choke point here related to the correct rendering of Indic scripts: some of the complex vowels and subscripts aren't being displayed correctly, which probably deserves its own issue. I'll try and do some more research about where the rendering issue is coming for before I open that though. This a pretty common issue: it turns out complex scripts are complex to render!

Me too.

Can we close this?

@lkozloff

You use MySQL as database?
If so, set up SQLAlchemy URI as below.

mysql://username:[email protected]:3306/mysql_database_name?charset=utf8

The suffix ?charset=utf8 is important. I confirmed correct characters (Japanese) showed by this descripition :-)

This is fixed, closing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

josephtyler picture josephtyler  路  3Comments

lenguyenthedat picture lenguyenthedat  路  3Comments

fly-high-bj picture fly-high-bj  路  3Comments

XiaodiKong picture XiaodiKong  路  3Comments

thoralf-gutierrez picture thoralf-gutierrez  路  3Comments