Boto3: S3: Should put_object automatically split files larger than 5GB?

Created on 12 Jun 2017  路  4Comments  路  Source: boto/boto3

S3 doesn't allow you to PUT files more than 5gb at a time. However, boto3 will allow me to run something like :

import boto3
s3 = boto3.resource('s3')
data = open('/6gbfile', 'rb')
s3.Bucket('myTestBucket').put_object(Key='6gbfile', Body=data)

After a minute or so it throws:
botocore.exceptions.ClientError: An error occurred (EntityTooLarge) when calling the PutObject operation: Your proposed upload exceeds the maximum allowed size

I must use split in order to upload my large files! Should boto3 handle this for the user?

question

Most helpful comment

I love you, I love this tool, you make my life easy

All 4 comments

The put_object method maps directly to the PutObject API request in S3. boto3 offers higher level abstractions that you can use that will automatically manage the multipart uploads for you. Docs here: http://boto3.readthedocs.io/en/latest/guide/s3.html#uploads. You can use either the upload_file or the upload_fileobj method. Let me know if you have any more questions.

I love you, I love this tool, you make my life easy

Guys, so according to you then the multipart upload is driven by boto3 automatically? it's not required an upload ID to gather the parts upload of the file in AWS S3? @jamesls

I was following the documentation and then seems like the multipart in Python API is only used for glacier? @kopertop
thanks

ClientError: An error occurred (EntityTooLarge) when calling the PutObject operation: Your proposed upload exceeds the maximum allowed size

i am trying to download the coco dataset to s3 bucket and i get this error . how do i solve it? the codes i've written are correct.

`%%time
import boto3
import re
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

role = get_execution_role()

bucket='masterthesisvyvian' # customize to your bucket

containers = {'us-west-2' : '433757028032.dkr.us-west-2.amazonaws.com/image-classification:latest',
'us-east-1' : '811284229777.dkr.us-east-1.amazonaws.com/image-classification:latest',
'us-east-2' : '825641698319.dkr.us-east-2.amazonaws.com/image-classification:latest',
'eu-west-1' : '685385470294.dkr.eu-west-1.amazonaws.com/image-classification:latest'}

training_image = containers[boto3.Session().region_name]

print(training_image)

import os
import urllib.request
import boto3

def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)

def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)

download('http://images.cocodataset.org/zips/train2017.zip')
download('http://images.cocodataset.org/zips/test2017.zip')
download('http://images.cocodataset.org/zips/val2017.zip')
download('http://images.cocodataset.org/annotations/annotations_trainval2017.zip')
upload_to_s3('s3_train_key', 'train2017.zip')
upload_to_s3('s3_test_key', 'test2017.zip')
upload_to_s3('s3_val_key', 'val2017.zip')
upload_to_s3('s3_annotation_key', 'annotations_trainval2017.zip')`

Was this page helpful?
0 / 5 - 0 ratings