Aiohttp: Cannot download urls with Cyrillic letters and https protocol.

Created on 1 Jul 2016  Â·  9Comments  Â·  Source: aio-libs/aiohttp

My script:

import aiohttp
import asyncio


async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    url = u'https://цфоут.мвд.рф/news/item/8065038/'
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, url))
        print(html)

It fails with the following error:

Exception in callback None
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.5/asyncio/selector_events.py", line 671, in _read_ready
    self._protocol.data_received(data)
  File "/usr/lib/python3.5/asyncio/sslproto.py", line 492, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.5/asyncio/sslproto.py", line 200, in feed_ssldata
    self._sslobj.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 633, in do_handshake
    match_hostname(self.getpeercert(), self.server_hostname)
  File "/usr/lib/python3.5/ssl.py", line 296, in match_hostname
    % (hostname, ', '.join(map(repr, dnsnames))))
ssl.CertificateError: hostname 'цфоут.мвд.рф' doesn't match either of '*.xn--b1aew.xn--p1ai', 'xn--b1aew.xn--p1ai'

Interestingly enough, string 'цфоут.мвд.рф' actually matches '*.xn--b1aew.xn--p1ai':

>>> 'цфоут.мвд.рф'.encode('idna').decode('utf8').endswith('.xn--b1aew.xn--p1ai')
True

Same script with requests:

# This works fine
import requests

if __name__ == '__main__':
    url = u'https://цфоут.мвд.рф/news/item/8065038/'
    print(requests.get(url).text)

Versions

$ python -V
Python 3.5.1
$ pip3 freeze
aiohttp==0.22.0a0
chardet==2.3.0
multidict==1.0.3
requests==2.10.0
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:    14.04
Codename:   trusty

What I think

I found similar question on SO, but setting verify_ssl=False looks like a pretty dangerous hack to me.

outdated

Most helpful comment

We should use url.raw_host
I'll make a PR soon.

All 9 comments

This problem caused by https://hg.python.org/cpython/file/3.5/Lib/ssl.py#l381
Before this function called, server_hostname == 'xn--n1aiccj.xn--b1aew.xn--p1ai'.
But after it became server_hostname == 'цфоут.мвд.рф'. (I don't know why yet.)
I can't dig in more.

requests use wrap_socket() not wrap_bio() at requests.packages.urllib3.connection. HTTPSConnection

this is not aiohttp bug. please file python bug report.

python3.4 uses wrap_socket()

We should use url.raw_host
I'll make a PR soon.

Hey, for now at we have created small monkey patch https://github.com/wikibusiness/idna_ssl

Can You try is it helps for Your case?

CPython fix: python/cpython#3010

Fixed in aiohttp 2.3.10

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].

Was this page helpful?
0 / 5 - 0 ratings

Related issues

asvetlov picture asvetlov  Â·  4Comments

ahuigo picture ahuigo  Â·  5Comments

yuval-lb picture yuval-lb  Â·  5Comments

sersorrel picture sersorrel  Â·  4Comments

jonringer picture jonringer  Â·  4Comments