The documentation for storing data to S3 with boto3 at http://boto3.readthedocs.org/en/latest/guide/migrations3.html#storing-data says:
Storing data from a file, stream, or string is easy:
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
So I tried using a stream, rather than a file as in the example:
s3 = boto3.resource('s3')
with contextlib.closing(requests.get(url, stream=True)) as r:
s3.Object(bucket, key).put(Body=r.raw)
It does not actually work because the library attempts to seek on the stream, which it obviously can't:
Traceback (most recent call last):
File "boto3_put.py", line 12, in
s3.meta.client.put_object(Bucket=bucket, Key=key, Body=r.raw)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 301, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 381, in _make_api_call
request_signer=self._request_signer, context=request_context)
File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 241, in emit_until_response
responses = self._emit(event_name, kwargs, stop_on_response=True)
File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 209, in _emit
response = handler(*_kwargs)
File "/usr/lib/python2.7/site-packages/botocore/handlers.py", line 153, in conditionally_calculate_md5
calculate_md5(params, *_kwargs)
File "/usr/lib/python2.7/site-packages/botocore/handlers.py", line 130, in calculate_md5
binary_md5 = _calculate_md5_from_file(body)
File "/usr/lib/python2.7/site-packages/botocore/handlers.py", line 145, in _calculate_md5_from_file
fileobj.seek(start_position)
io.UnsupportedOperation: seek
So we do not have the ability to stream from unseekable streams. I am not sure if handling unseekablee streams in PutObject would be something we would want to support. This would be much better to have support for seekable and nonseekable files in the upload_file method on the transfer manager: https://github.com/boto/boto3/issues/432.
One action item we can do though is clean up the error message. Ideally we bubble it up past all of the handlers and would be similar to this logic: https://github.com/boto/botocore/blob/c6996f5bba74895845bca412314f996d94000ad0/botocore/awsrequest.py#L409
I'm all for clearer error messages, but in this case it was the documentation which stated
"Storing data from a file, stream, or string is easy"
It would be nice as a user if the API accepted a stream, so that I can stream large files from remote servers directly into S3. But if it's necessary to have the entire file in hand before upload to S3, that's workable as well as longer as it's clear that's what's expected.
I'd second this ticket.
In fact, I had written the exact same code as @dadkins , starting from the same doc-based assumption. Having the ability to pipe streams directly into s3 would be really nice, given that we already have the ability to give file handlers as the Body argument.
With the release of boto3 1.4.0, we have added support for nonseekable streams for uploads: https://boto3.readthedocs.io/en/latest/guide/s3.html#uploads. Please use one of the upload_fileobj() methods to do so. I do not think the put() method will ever support nonseekable files as the ability to seek is needed for this lower level method to calculate md5's and sha256 checksums. Plus the upload_fileobj() method is multithreaded and manages multipart uploads for you automatically so that is much more convenient than the put() method.
Resolving issue.
Just sample code (tested)
with subprocess.Popen( "mysqldump -u"+ dbuser +" -p" + dbpass + " --add-drop-table " + db + "|bzip2", stdout=subprocess.PIPE, shell=True).stdout as dataStream:
s3.upload_fileobj(dataStream, config.get('amazon', 'bucket'), "mysql_"+db+'_'+date+".bz2" )
May be someone can improve it.
@kyleknap .. Thanks for clarifying things. But if I use upload_fileobj() to upload a stream, do I loose the ability to use server-side encryption ? Any way around this ? Thanks!
@kim0 No, you retain the ability to use server-side encryption. Use the fourth parameter, ExtraArgs, to pass in the ServerSideEncryption argument.
s3.upload_fileobj(stream, 'my_bucket', 'my_key', {'ServerSideEncryption': 'AES256'})
Most helpful comment
With the release of boto3 1.4.0, we have added support for nonseekable streams for uploads: https://boto3.readthedocs.io/en/latest/guide/s3.html#uploads. Please use one of the
upload_fileobj()methods to do so. I do not think theput()method will ever support nonseekable files as the ability to seek is needed for this lower level method to calculate md5's and sha256 checksums. Plus theupload_fileobj()method is multithreaded and manages multipart uploads for you automatically so that is much more convenient than theput()method.Resolving issue.