I am trying to read the tables from a Wikipedia page using the following code:
import pandas as pd
pd.read_html('https://en.wikipedia.org/wiki/2013–14_Premier_League')
Doing that generates the following error:
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 14: ordinal not in range(128)
I have tried
pd.read_html('https://en.wikipedia.org/wiki/2013–14_Premier_League', encoding='utf-8')
But still get the same error. The following works:
import requests
r = requests.get('https://en.wikipedia.org/wiki/2017–18_Premier_League')
c = r.content
dfs = pd.read_html(c)
What I want to know is how to get pd.read_html() to work directly on the url without requests. What is it that I don't understand about encoding or is this a problem with Pandas?
I am running an Anaconda distribution of Pandas 0.21.1 and Python 3.5.4. Thanks for any help.
Hmm interesting. Looks like this is still an issue on master even specifying the encoding to be used:
>>> pd.read_html('https://en.wikipedia.org/wiki/2013–14_Premier_League', encoding='utf-8')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 14: ordinal not in range(128)
Investigation and PRs are always welcome
https://stackoverflow.com/questions/39229439/encoding-error-when-reading-url-with-urllib
As seen in this similar issue, urllib only works with ASCII requests. To remedy, I used the Requests library (http://docs.python-requests.org/en/master/).
FWIW the sample call works fine under Python 2.7.15 but not Python 3.6.5. Choice of engine doesn't matter, however.
I used the following solution:
import requests
url = "https://ru.wikipedia.org/wiki/Города_России_с_населением_более_500_тысяч_человек"
r = requests.get(url, auth=('user', 'pass'))
website = r.textimport pandas as pd
tables = pd.read_html( website, encoding="UTF-8")City_pop = tables[4]
Most helpful comment
I used the following solution: