trying to create a boto3 s3 client on EMR Master node.
just running this in the python interpreter:
s3 = boto3.client('s3')
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/boto3/__init__.py", line 91, in client
return _get_default_session().client(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "/usr/lib/python2.7/dist-packages/botocore/session.py", line 861, in create_client
client_config=config, api_version=api_version)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 70, in create_client
cls = self._create_client_class(service_name, service_model)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 95, in _create_client_class
base_classes=bases)
File "/usr/lib/python2.7/dist-packages/botocore/hooks.py", line 227, in emit
return self._emit(event_name, kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/hooks.py", line 210, in _emit
response = handler(**kwargs)
File "/usr/local/lib/python2.7/site-packages/boto3/utils.py", line 61, in _handler
module = import_module(module)
File "/usr/local/lib/python2.7/site-packages/boto3/utils.py", line 52, in import_module
__import__(name)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/inject.py", line 15, in <module>
from boto3.s3.transfer import create_transfer_manager
File "/usr/local/lib/python2.7/site-packages/boto3/s3/transfer.py", line 129, in <module>
from s3transfer.manager import TransferConfig as S3TransferConfig
File "/usr/local/lib/python2.7/site-packages/s3transfer/manager.py", line 21, in <module>
from s3transfer.utils import get_callbacks
File "/usr/local/lib/python2.7/site-packages/s3transfer/utils.py", line 27, in <module>
from botocore.exceptions import ReadTimeoutError
ImportError: cannot import name ReadTimeoutError
This appears to be a package structure problem to me
looks like boto3 v1.9.91 doesn't have this problem... maybe it's been fixed
What versions of botocore and s3transfer did you have installed? I'm not able to repro this issue. Here's what I tried:
# Create new virtualenv
$ mktmpenv
$ pip install boto3==1.9.90
$ python
>>> import boto3
>>> boto3.__version__
'1.9.90'
>>> boto3.client('s3')
<botocore.client.S3 object at 0x10c387590>
The last line in that traceback:
File "/usr/local/lib/python2.7/site-packages/s3transfer/utils.py", line 27, in <module>
from botocore.exceptions import ReadTimeoutError
ImportError: cannot import name ReadTimeoutError
was added in s3transfer version 0.2.0, which requires botocore >=1.12.36:
https://github.com/boto/s3transfer/blob/0e2b41f73260f321c3eb44bcefe435f9c9b2aea6/setup.py#L14
And botocore 1.12.36 has ReadTimeoutError defined: https://github.com/boto/botocore/blob/1.12.36/botocore/exceptions.py#L109
If you're installing via pip this error shouldn't be happening. Any additional information you can provide would be helpful. Thanks!
I'm having the same issue with boto3 v1.9.91 v1.7.74 and s3transfter v0.2.0 (installed from boto3 requirements).
honestly, I don't know what version of botocore I had... I ended up specifying the following versions and everything worked nicely for me after that:
botocore==1.12.91
urllib3==1.24.1
boto3==1.9.91
Update: It turns out the problem was a mismatch between the provided boto libraries in AWS Lambda and the libraries I packaged into my Lambda code.
boto3: 1.7.74
botocore: 1.10.74
s3transfer: 0.2.0
urllib3: 1.24.1
cannot import name 'ReadTimeoutError': ImportError
Traceback (most recent call last):
File "/var/task/awstools/core.py", line 59, in __init__
s3 = boto3.client('s3')
File "/var/runtime/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "/var/runtime/botocore/session.py", line 885, in create_client
client_config=config, api_version=api_version)
File "/var/runtime/botocore/client.py", line 70, in create_client
cls = self._create_client_class(service_name, service_model)
File "/var/runtime/botocore/client.py", line 95, in _create_client_class
base_classes=bases)
File "/var/runtime/botocore/hooks.py", line 227, in emit
return self._emit(event_name, kwargs)
File "/var/runtime/botocore/hooks.py", line 210, in _emit
response = handler(**kwargs)
File "/var/runtime/boto3/utils.py", line 61, in _handler
module = import_module(module)
File "/var/runtime/boto3/utils.py", line 52, in import_module
__import__(name)
File "/var/runtime/boto3/s3/inject.py", line 15, in <module>
from boto3.s3.transfer import create_transfer_manager
File "/var/runtime/boto3/s3/transfer.py", line 129, in <module>
from s3transfer.manager import TransferConfig as S3TransferConfig
File "/var/task/s3transfer/manager.py", line 21, in <module>
from s3transfer.utils import get_callbacks
File "/var/task/s3transfer/utils.py", line 27, in <module>
from botocore.exceptions import ReadTimeoutError
ImportError: cannot import name 'ReadTimeoutError'
Fixed once I improved my packaging approach.
Thanks!
Hello,
What version of Boto3 and Botocore do you recommend to use to avoid the ReadTimeoutError on EMR?
Kind regards,
Eric
@ericsda
Just be sure to install a recent version of boto3 (1.9.36+) and botocore (1.12.36+) that match, and the latest s3transfer. In general though, I would recommend you just allow pip to figure out the versions for everything if you can.
well, I found that the versions that were installed on the AWS-provided AMI I was using didn't work...that's why I specified my own versions. Maybe that was a one-off, but it broke our spark scripts running on our EMR Cluster (just on the newly-built nodes... the older, original, nodes were based on an older AMI and the boto3, botocore, and s3transfer stuff worked on them ;) This strikes me as a vulnerability, and I feel like I want to hedge against that happening again by specifying a known-to-work set of packages
@sqNutrien I would be very surprised if an AWS-provided AMI had conflicting packages like this, though I suppose it's not impossible. Could you provide the AMI ID so I can verify?
It's more likely that a package installed on top of the base image introduced a conflicting package without properly installing the corresponding dependencies.
you might be right. I'm not able to get the AMI that was used to spin up the Task nodes as they're terminated and gone ... maybe I could find the AMI in some logs somewhere.
As I think about it, the original manifestation of the problem was that our spark code quit working due to a failed import error. This failure was a result of a change in the package structure of boto3, as I remember. The problem was only happening on new nodes - ones spun up as a result of autoscaling. These new nodes had a new version of boto3 (1.9.90) whereas the old nodes had an older version (1.9.86) which had the "original" package structure.
I modified my bootstrap script to install boto3 during bootstrapping, and rebuilt the cluster, then started the python interpreter on the master node, imported boto3, and tried to do something to test it... but that failed due to a version mismatch with botocore. I expected botocore to be updated when the bootstrap script installed boto3 (as it does if one manually installs boto3). So, I added all of the compatible packages, specifying version, in my bootstrap script, and that resolved the package incompatibility problem (and protects us from breaking package changes)
@sqNutrien Interesting. We haven't made any changes in package structure to botocore or boto3 in quite a while. About a week ago we did update s3transfer which is the package failing to import here, but the version requirements in s3transfer are correct so as long as pip is enforcing the version requirements this shouldn't be an issue.
I took another look at your original post and noticed that boto3 and s3transfer are coming from site-packages, but botocore is coming from dist-packages (a version of botocore a system package installed). Seems to me that pip packages and packages from your linux distribution were in conflict here.
Manually specifying the versions you want / need is a good remedy for these types of issues. Another precaution that I would recommend to avoid these kinds of issues would be to ensure you application is running in a virtualenv so system python packages don't affect your application.
FYI this was driving me a bit nuts so I tried to see where it came from. We install this from the Centos Base repository on Centos7. If you grab the RPM here http://mirror.centos.org/centos/7/os/x86_64/Packages/awscli-1.14.28-5.el7_5.1.noarch.rpm and run a dependencies check rpm -qpR ./awscli-1.14.28-5.el7_5.1.noarch.rpm you get:
PyYAML >= 3.10
python(abi) = 2.7
python-cryptography >= 1.7.2
python-docutils >= 0.10
python-s3transfer >= 0.1.9
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PartialHardlinkSets) <= 4.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsXz) <= 5.2-1
If I backport mine to 0.1.13 it works without that error.
We got the same error, originating from awscli. awscli is installed globally for us, however for our CI user it appeared that dependencies were being resolved from a different location. For example, botocore and s3transfer varied between: sudo pip freeze and pip freeze.
This solution is ugly, but worked for us:
# Executed as root in cloud-init script:
pip freeze | xargs pip uninstall -y # Remove everything.
pip install --upgrade pip # Self-upgade, which moves pip location.
ln -s /usr/local/bin/pip /usr/bin/pip # Set up a symlink since some things expect pip in /usr/bin
chmod -R 755 /usr/local/lib/python2.7/ # Fix permissions.
@joguSD I'll look into setting up a virtualenv on the EMR nodes ... if possible, it's a good idea
@jasonbartz We don't maintain the re-packaged versions of the CLI in distribution specific repositories, if there's an issue with the CentOS packages you should let them know.
@dpup Yeah, that sort of cross contamination of python paths is the root cause for the majority of the version mismatch issues that I've seen. Rather than mucking with the system python I would leverage virtualenv to isolate your application.
@joguSD thanks, I figured as much. Just thought I would pop it in here in case other people came across it. I'll see if I can figure out how to let them know. 馃憤
Rather than mucking with the system python I would leverage virtualenv to isolate your application.
Agreed, our code was running with virtualenv, but apparently still led to contamination with the awscli which was globally installed.
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.
honestly, I don't know what version of botocore I had... I ended up specifying the following versions and everything worked nicely for me after that:
botocore==1.12.91
urllib3==1.24.1
boto3==1.9.91
I install with your advince, but I got this error when I start my service:
import boto3
File "/usr/local/lib/python2.7/dist-packages/boto3/__init__.py", line 16, in <module>
from boto3.session import Session
File "/usr/local/lib/python2.7/dist-packages/boto3/session.py", line 17, in <module>
import botocore.session
File "/usr/local/lib/python2.7/dist-packages/botocore/session.py", line 30, in <module>
import botocore.credentials
File "/usr/local/lib/python2.7/dist-packages/botocore/credentials.py", line 42, in <module>
from botocore.utils import InstanceMetadataFetcher, parse_key_val_file
File "/usr/local/lib/python2.7/dist-packages/botocore/utils.py", line 31, in <module>
import botocore.httpsession
File "/usr/local/lib/python2.7/dist-packages/botocore/httpsession.py", line 7, in <module>
from urllib3.util.ssl_ import (
ImportError: cannot import name OP_NO_SSLv2
Most helpful comment
honestly, I don't know what version of botocore I had... I ended up specifying the following versions and everything worked nicely for me after that:
botocore==1.12.91
urllib3==1.24.1
boto3==1.9.91