Aiohttp: ClientConnectorCertificateError raised when reading parquet file from GCS using pandas and aiohttp==3.7.0

Created on 25 Oct 2020  路  2Comments  路  Source: aio-libs/aiohttp

馃悶 Describe the bug
When reading parquet files from Google Cloud Storage using Pandas and aiohttp==3.7.0, the following error is thrown:

aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host www.googleapis.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1108)')]

馃挕 To Reproduce

python3 -m venv venv
source venv/bin/activate

pip install pandas pyarrow fsspec gcsfs

And attempt to download any file from GCS using pd.read_parquet("gs://..").

For example, the file gs://gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_ANG.txt is publicly available so we can use it as a test case. Even though it is not valid parquet, it will crash on the reported error before complaining about the file format.

import pandas as pd

file_uri = "gs://gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_ANG.txt"

pd.read_parquet(file_uri).head()

馃挕 Expected behavior

The file was able to be downloaded successfully (or, in the above test case, should crash with OSError: Could not open parquet input source ... Either the file is corrupted or this is not a parquet file.)

馃搵 Logs/tracebacks

```python-traceback (paste your traceback in the next line)
Traceback (most recent call last):
File "", line 1, in
File "/Users/steve/venv/lib/python3.8/site-packages/pandas/io/parquet.py", line 317, in read_parquet
return impl.read(path, columns=columns, *kwargs)
File "/Users/steve/venv/lib/python3.8/site-packages/pandas/io/parquet.py", line 141, in read
result = self.api.parquet.read_table(
File "/Users/steve/venv/lib/python3.8/site-packages/pyarrow/parquet.py", line 1607, in read_table
dataset = _ParquetDatasetV2(
File "/Users/steve/venv/lib/python3.8/site-packages/pyarrow/parquet.py", line 1439, in __init__
if filesystem.get_file_info(path).is_file:
File "pyarrow/_fs.pyx", line 438, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1004, in pyarrow._fs._cb_get_file_info
File "/Users/steve/venv/lib/python3.8/site-packages/pyarrow/fs.py", line 195, in get_file_info
info = self.fs.info(path)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper
return maybe_sync(func, self, *args, *
kwargs)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync
return sync(loop, func, args, *kwargs)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
raise exc.with_traceback(tb)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
result[0] = await future
File "/Users/steve/venv/lib/python3.8/site-packages/gcsfs/core.py", line 781, in _info
return await self._get_object(path)
File "/Users/steve/venv/lib/python3.8/site-packages/gcsfs/core.py", line 576, in _get_object
bucket, await self._call("GET", "b/{}/o/{}", bucket, key, json_out=True)
File "/Users/steve/venv/lib/python3.8/site-packages/gcsfs/core.py", line 487, in _call
async with self.session.request(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/client.py", line 1083, in __aenter__
self._resp = await self._coro
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/client.py", line 490, in _request
conn = await self._connector.connect(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 528, in connect
proto = await self._create_connection(req, traces, timeout)
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 868, in _create_connection
_, proto = await self._create_direct_connection(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 1023, in _create_direct_connection
raise last_exc
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 999, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 948, in _wrap_create_connection
raise ClientConnectorCertificateError(
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host www.googleapis.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1108)')]


馃搵 **Your version of the Python**
<!-- Attach your version of the Python. -->
```console
$ python --version
Python 3.8.2

馃搵 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.0
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /Users/steve/venv/lib/python3.8/site-packages
Requires: async-timeout, multidict, attrs, yarl, chardet
Required-by: gcsfs

```console
$ python -m pip show multidict
Name: multidict
Version: 5.0.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.[email protected]
License: Apache 2
Location: /Users/steve/venv/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp

```console
$ python -m pip show yarl
Name: yarl
Version: 1.6.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /Users/steve/venv/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

馃搵 Additional context

Downgrading aiohttp to 3.6.3 fixes the issue

python3 -m venv venv
source venv/bin/activate

pip install pandas pyarrow fsspec gcsfs
pip freeze > bad.txt

rm -rf venv
python3 -m venv venv
source venv/bin/activate

pip install pandas pyarrow fsspec gcsfs aiohttp==3.6.3
pip freeze > good.txt

diff <(<bad.txt) <(<good.txt)

Gives

1c1
< aiohttp==3.7.0
---
> aiohttp==3.6.3
13c13
< multidict==5.0.0
---
> multidict==4.7.6
27c27
< yarl==1.6.2
---
> yarl==1.5.1