I need to use a newer boto3 package for an AWS Glue Python3 shell job (Glue Version: 1.0). I included the wheel file downloaded from: https://pypi.org/project/boto3/1.13.21/#files: boto3-1.13.21-py2.py3-none-any.whl under Python Library Path. However, boto3.__version__ prints out 1.9.203 even if the job log console says boto3==1.13.21 was successfully installed. For some reason, Glue Python3 Shell job (Glue Version: 1.0) is not letting me overwrite the boto3 package version with the wheel file. Is there any way to overwrite?
Thank you for your post. Have you tried by passing the python libraries packaged as an .egg or a .whl file under the flag --extra-py-files within the DefaultArguments sections ?
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
@swetashre Thanks for looking into this. I think the AWS Glue UI representation of the --extra-py-files param is the Python library path in the screenshot below. If I create an AWS Glue ETL Python job via API with the --extra-py-files property specified, it will populate the Python library path property box on the UI. I am ready importing a newer Boto3 wheel file: boto3-1.13.21-py2.py3-none-any.whl from s3 under that property.

Please let me know if I need to provide any additional information.
Yes, you can specify the s3 path under the Python library path to import the newer version of SDK.
https://docs.aws.amazon.com/glue/latest/dg/console-custom-created.html
I hope this helps.
@swetashre Thanks, but my real problem is that AWS Glue Python Shell job does not let you overwrite the default version: 1.9.203 even if you specify a newer version. Do you think it is more like question for the AWS Glue group?
@al11111 - Thanks for replying. It should take the newer version after specifying the file under the python library path.
I have seen someone has posted the same query in aws developer forum as well
https://forums.aws.amazon.com/thread.jspa?messageID=954344#954344
But if it is not taking then it would be a question for AWS Glue. It might be a bug with Glue but need investigation on glue side. I would recommend creating ticket to AWS Support.
@swetashre I posted that question in the AWS Glue forum. Thanks for your time and assistance. I will wait for someone from the AWS side to answer my question there.
@al11111 - I am able to reproduce the issue. It is successfully downloading the recent boto3 version but after importing it is still printing boto3 version 1.9.203.
I will reach out to the service team about the issue and update here when i get any response back.
@swetashre Thanks for the extra effort and help you provided. I really appreciate it.
I am having the same issue. Any updates?
@praveen203 I still have not heard any updates from both this thread and the AWS Glue forum thread mentioned above.
This has been a giant thorn in our side when trying to use Glue python scripts for anything in the last year. The boto3 version in Glue is so old, you can't even reference Glue Workflows in it, even though the documentation states that's what you need to do in order to pull down arguments passed to a glue workflow.
We have been using Glue 2.0 PySpark job to run python script as a workaround for this particular issue. Fortunately, with the latest release in August: https://aws.amazon.com/about-aws/whats-new/2020/08/aws-glue-version-2-featuring-10x-faster-job-start-times-1-minute-minimum-billing-duration/, Glue Spark job provisioning time and pricing are much more reasonable.
@swetashre Any update from the AWS service team about the issue by chance?
@alexling0722 - Sorry for the late reply. I don't have any updated on the issue. Will post here when i get any update.
@swetashre Any ETA from service team? I want to use boto3 RedshiftDataAPIService, but for now I am trying workaround using AWS Data Wrangler ( https://github.com/awslabs/aws-data-wrangler).
Hi,
We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3
Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.
Upload the files to s3 bucket in your given python library path
Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma
Add the following code snippet to load the new files
#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
import sys
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
if 'boto' in k:
del sys.modules[k]
import boto3
print('boto3 version')
print(boto3.__version__)
Reference : AWS Wrangler Glue dependency build
We followed the steps mentioned above for awscli and boto3 whl files.
Below is the latest requirements.txt compiled for the newest versions
colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0
pip download -r requirements.txt -d libs
cd libs
zip ../boto3-depends.zip *
Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path
_Note: It is Referenced files path and not Python library path_
Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell. Additional code as per below thread
https://forums.aws.amazon.com/thread.jspa?messageID=954344
import os.path
import subprocess
import sys
# borrowed from https://stackoverflow.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))
zip_file = get_referenced_filepath("awswrangler-depends.zip")
subprocess.run(["unzip", zip_file])
# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(["pip3 install --no-index --find-links=. -t . *.whl"], shell=True)
#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
if 'boto' in k:
del sys.modules[k]
import boto3
print('boto3 version')
print(boto3.__version__)
Thanks,
Sarath
@sarath-mec: This seems to be working..thanks for the info
@sarath-mec I tried the "AWS Glue Python Shell with Internet" methodology, it downloaded the upgraded Boto version but when importing and printing the version it still gives the old version.
The second method works correctly though. Thanks for the help!
@sarath-mec I tried the "AWS Glue Python Shell with Internet" methodology, it downloaded the upgraded Boto version but when importing and printing the version it still gives the old version.
The second method works correctly though. Thanks for the help!
Hi,
I have added a new step in Option 1. The main logic to reload the API is added as step 4 in first option,
It looks like we need to add this to our python script
import sys
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
if 'boto' in k:
del sys.modules[k]
@alexling0722 - Can you confirm if this works for you or not ?
Most helpful comment
Hi,
We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3
AWS Glue Python Shell with Internet
Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.
Upload the files to s3 bucket in your given python library path
Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma
Add the following code snippet to load the new files
AWS Glue Python Shell without Internet connectivity
Reference : AWS Wrangler Glue dependency build
We followed the steps mentioned above for awscli and boto3 whl files.
Below is the latest requirements.txt compiled for the newest versions
Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path
_Note: It is Referenced files path and not Python library path_
Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell. Additional code as per below thread
https://forums.aws.amazon.com/thread.jspa?messageID=954344
Thanks,
Sarath