One of Python's useful features is its ability to load modules from a .zip archive (PEP here), allowing you to package up multiple dependencies into a single file.
Boto breaks when trying to import it from a .zip, throwing:
```File "C:\code sandbox\boto.zip\boto3__init__.py", line 91, in client
File "C:\code sandbox\boto.zip\boto3\session.py", line 263, in client
File "C:\code sandbox\boto.zip\botocore\session.py", line 799, in create_client
File "C:\code sandbox\boto.zip\botocore\session.py", line 668, in _get_internal_component
File "C:\code sandbox\boto.zip\botocore\session.py", line 870, in get_component
File "C:\code sandbox\boto.zip\botocore\session.py", line 150, in create_default_resolver
File "C:\code sandbox\boto.zip\botocore\loaders.py", line 132, in _wrapper
File "C:\code sandbox\boto.zip\botocore\loaders.py", line 424, in load_data
botocore.exceptions.DataNotFoundError: Unable to load data for: endpoints
How to Reproduce:
1. Create a .zip containing boto3 and botocore
2. Create a .py file in the same directory as the zip (access keys removed for obvious reasons):
```import sys
sys.path.insert(0, 'boto.zip')
import boto3
s3 = boto3.client('s3', aws_access_key_id='access_key', aws_secret_access_key='secret_key')
Tested on Python 3.6.7
boto3 1.9.39
botocore 1.12.39
Confirmed. Our data loaders can't handle when being run from a zip.
Specifically, we try to search for the data in the following directory:
'.../botocore.zip/botocore/data'
Which fails our isdir check and is thus skipped.
Marking this as a feature request.
What are the odds of getting this implemented? Its preventing us from distributing boto3, which makes it very hard to provide a package that depends on it in PySpark.
https://stackoverflow.com/a/22646702 has a snippet processing a zip.
is this issue resolved? Am also stuck with loading boto3 from .zip file when using --py-files option in spark2-submit. Appreciate any help to overcome this situation
pytz has a similar issue reading timezone data in the zoneinfo folder from a packaged directory. To get around this is uses pkg_resources.resource_stream from setuptools - https://github.com/stub42/pytz/blob/7b1a844c8ecf2996142ac0eb32201b676e9dcb9a/src/pytz/__init__.py#L101
https://setuptools.readthedocs.io/en/latest/pkg_resources.html
It adds setuptools as a dependency when distributing as a zip, but at least it works. Would be great to have a fix for this. Workarounds are needlessly ugly.
Is https://github.com/boto/boto3/issues/1008 also a duplicate?
I submitted PR https://github.com/boto/botocore/pull/1969
Could a committer review?
@gliptak Hello, I encounter this problem now, can we reopen https://github.com/boto/botocore/pull/1969 and fix this problem?
@shadowdsp we need a commiter's help on that repo to move forward
Most helpful comment
What are the odds of getting this implemented? Its preventing us from distributing boto3, which makes it very hard to provide a package that depends on it in PySpark.