When we create a word cloud from a mysql source we get just ????????? for any non English language data.
We're using Khmer (Cambodian), but I suspect that this would hold true for other unicode languages.
Any clue what layer is choking on the unicode? If you have a chance, can you test in python3 and report wether the bug exists there as well?
I got it going in Python3 and I can confirm that the Japanese and Korean unicode text does render as expected... so it seems to be an issue in the python layer. Yay progress!
That said, there is another choke point here related to the correct rendering of Indic scripts: some of the complex vowels and subscripts aren't being displayed correctly, which probably deserves its own issue. I'll try and do some more research about where the rendering issue is coming for before I open that though. This a pretty common issue: it turns out complex scripts are complex to render!
Me too.
Can we close this?
@lkozloff
You use MySQL as database?
If so, set up SQLAlchemy URI as below.
mysql://username:[email protected]:3306/mysql_database_name?charset=utf8
The suffix ?charset=utf8 is important. I confirmed correct characters (Japanese) showed by this descripition :-)
This is fixed, closing
Most helpful comment
@lkozloff
You use
MySQLas database?If so, set up
SQLAlchemy URIas below.The suffix
?charset=utf8is important. I confirmed correct characters (Japanese) showed by this descripition :-)