Folium: Unicode issue in tooltips on Jupyter notebook

Created on 5 May 2020  ·  10Comments  ·  Source: python-visualization/folium

image

MVP:

import folium
map_osm = folium.Map()
folium.GeoJson('{ "type": "Feature", "properties": { "name": "5/7, Линейная улица, Berdsk, Berdsk municipality, Novosibirsk Oblast, Siberian Federal District, 633011, Russia" }, "geometry": { "type": "Point", "coordinates": [ -75.849253579389796, 47.6434349837781 ] }}', name="5/7, Линейная улица, Berdsk, Berdsk municipality, Novosibirsk Oblast, Siberian Federal District, 633011, Russia", tooltip="5/7, Линейная улица, Berdsk, Berdsk municipality, Novosibirsk Oblast, Siberian Federal District, 633011, Russia").add_to(map_osm)
display(map_osm)

I'm running in Jupyter Lab with Python3.7 and latest Folium version (0.10.1+28.ga8ec61d which is with my PR)

Is there a workaround for this?

bug jupyter

Most helpful comment

A fix has been merged in the branca library. It will be availabe in the next release, release date yet unknown. If you want it earlier you can install branca from the git main branch:

pip install git+https://github.com/python-visualization/branca.git@master

All 10 comments

I can confirm this is indeed an issue, on Jupyter notebooks only. The encoding is set correctly to utf-8 in both the notebook frame and the map iframe. Since the characters display correctly in the layer control, it's unlikely an issue with the file encoding. This issue seems specific to the tooltip. It also happens for popups.

The characters appear garbled in the html:

<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, Линейная улица, Berdsk</div>

It's already broken in the JS code that generates that html:

var html_46e5bc2ac281404b8b359d6c5707703d = $(`<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, Линейная улица, Berdsk</div>`)[0];

In the JS code for the layer control the same string is properly encoded though:

overlays :  {
    "5/7, \u041b\u0438\u043d\u0435\u0439\u043d\u0430\u044f \u0443\u043b\u0438\u0446
},

I can confirm this is indeed an issue, on Jupyter notebooks only. The encoding is set correctly to utf-8 in both the notebook frame and the map iframe. Since the characters display correctly in the layer control, it's unlikely an issue with the file encoding. This issue seems specific to the tooltip. It also happens for popups.

The characters appear garbled in the html:

<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, �инейна� �ли�а, Berdsk</div>

It's already broken in the JS code that generates that html:

var html_46e5bc2ac281404b8b359d6c5707703d = $(`<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, �инейна� �ли�а, Berdsk</div>`)[0];

In the JS code for the layer control the same string is properly encoded though:

overlays :  {
    "5/7, \u041b\u0438\u043d\u0435\u0439\u043d\u0430\u044f \u0443\u043b\u0438\u0446
},

Any idea where the problem lies? I tried to have a look at it and couldn't see anything obvious. Did some poking around as well but couldn't make any headway

I think I found the issue. In branca we encode the html for in the notebook. This uses encode('utf-8'). A unicode string like "5/7, Линейная улица, Berdsk" is turned into bytes b'5/7, \xd0\x9b\xd0\xb8\xd0\xbd\xd0\xb5\xd0\xb9\xd0\xbd\xd0\xb0\xd1\x8f \xd1\x83\xd0\xbb\xd0\xb8\xd1\x86\xd0\xb0, Berdsk'. This is then base64 encoded.

When the notebook rehydrates this code it uses atob to do base64 decoding. This function does not convert those characters to the right representations: Ð\u009bинейнаÑ\u008f Ñ\u0083лиÑ\u0086а. I'm no expert on JS but I think it uses a default charset of utf-16.

The solution is to encode the html not as utf-8, but using raw_unicode_escape. This converts "5/7, Линейная улица, Berdsk" into b'5/7, \\u041b\\u0438\\u043d\\u0435\\u0439\\u043d\\u0430\\u044f \\u0443\\u043b\\u0438\\u0446\\u0430, Berdsk' which results in proper dehydrated html in the browser.

I'll open a PR in branca with this fix. You could really help by testing that fix!

@galewis2 did you get a chance to test the PR? I'd merge it with more confidence if you could confirm it indeed solves your issue.

pip install git+https://github.com/conengmo/branca.git@fix-notebook-special-chars

I have the same problem. In Brazil, we use accents and cedillas...
When I save the map in an HTML, the "tooltips" appear normally ... but when the map is "rendered" in the jupyter notebook, the tooltips go wrong.

So I used the suggestion of .encode ('raw_unicode_escape') and it improved ...
but now there is a "b '" at the beginning of my tooltip
raw

And with I don't use the .encode ('raw_unicode_escape')
without

my code...
PS: I noted that if I try do the same in popup = '' + row ['Name'] + ''... it's not work inside a HTML....

# Add the different companies with colors by neighborhoods
for index, row in gdf_cap.iterrows ():
     if row ['Source'] in colors.keys ():
         folium.Marker (
             name = 'Fundraising',
             location = [row ['geometry']. y, row ['geometry']. x],
             popup = '<strong>' + row ['Name'] + '</strong>',
             tooltip = row ['Name']. encode ('raw_unicode_escape'),
             icon = folium.Icon (color = colors [row ['Source']], icon = 'gift')
         ) .add_to (m)

A fix has been merged in the branca library. It will be availabe in the next release, release date yet unknown. If you want it earlier you can install branca from the git main branch:

pip install git+https://github.com/python-visualization/branca.git@master

@michelmetran I had the same problem with Swedish åäö not rendering properly, and .encode('raw_unicode_escaped') helped to render the strings with non-garbled characters. I did get the b'' in the printed string as you got, but managed to solve it by converting back to string and removing the first two and the last characters like this

str(string.encode('raw_unicode_escaped')[2:-1]

This still seems to be a problem on Folium 0.11.0 (as present on Kaggle). I have to use this workaround:

to display a string such as:

avenue Decelles (Montréal, Côte-des-Neiges-Notre-Dame-de-Grâce)

I have to use this workaround:

def escape(x):
    raw = str(x.encode('raw_unicode_escape'))[2:-1]
    return html.escape(raw)

I think it's an issue with js atob. JavaScript built-in functions btoa and atob do not support Unicode strings. I use the atou function I get on the MDN portal

function atou(b64) {
  return decodeURIComponent(escape(atob(b64)));
}

This was fixed in https://github.com/python-visualization/branca/pull/76 and will be in branca version 0.4.2, which hasn't been released yet but will be soon.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sanga picture sanga  ·  3Comments

Seraf69 picture Seraf69  ·  3Comments

jgoad picture jgoad  ·  4Comments

sarajcev picture sarajcev  ·  3Comments

vhfdoliveira picture vhfdoliveira  ·  3Comments