Boto3: Boto3 S3 StreamingBody().read() reads once and returns nothing after that

Created on 25 Mar 2016  路  11Comments  路  Source: boto/boto3

>>> a = client.get_object(Bucket='imgtest',Key='testimage1.jpg')
>>> a['Body'].read()
b'...\xadk\xc9,\xda\xe7\xcb\xb7$\x91\xf7\xb3\xd3>\xd5V...'
>>> a['Body'].read()
b''

complete bytes removed for brevity. I get an object, and read it. Then I read it again, but no bytes are returned.

If this stream acts as a normal file IO stream, how can I seek to the beginning of the stream? seek() does not seem to be a method on the streamingBody object.

documentation

Most helpful comment

Is there a reason why the StreamingBody, is not seekable?
This becomes quite problematic when attempting to download portions of large files asynchronously. And what is the recommended way to do this?

All 11 comments

The class is described here. We will look to see if we can get this ported over or linked in the boto3 docs.

As seen in the docs, if you call read() with no amount specified, you read all of the data. So if you call read() again, you will get no more bytes.

There is also no seek() available on the stream because we are streaming directly from the server. The only way we could add a seek() method is to store all of the data in memory, which is not a great idea as body could be GB's large.

Is there any particular reason that this is still an open ticket?

Is there a reason why the StreamingBody, is not seekable?
This becomes quite problematic when attempting to download portions of large files asynchronously. And what is the recommended way to do this?

@danielmorozoff 'get_object' supports a range parameter.

client.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(amount_read, amount_read + chunk_size))

One way to allow .seek() is by botocore' StreamingResponse to receive the _raw_stream _opener_ (factory?), not the realized object. Then seeking to 0 would be just restarting the _raw_stream.

See: https://github.com/boto/botocore/blob/master/botocore/response.py#L42

is there any work around to use seek in StreamBody?

I solved this by using _raw_stream as per @alanjds comment above.
is this a good solution? or is there a better one?

raw_stream = codecs.getreader('utf-8-sig')(temp_file[u'Body'])._raw_stream.read().decode("UTF8") 
stream_csv = io.StringIO(raw_stream, newline=None)
stream_csv.seek(0)

@ryanermita I was thinking in a way to seek and _not_ putting the whole file in memory.

If you have no problem in filling the memory with the file, a cleaner way is to just StringIO(streaming_body.read()), then seek the StringIO as you are already doing.

I will try this one, thank you @alanjds :+1:

@kyleknap

Has it been suggested to change the botocore.streambody? I ran into this issue twice. (the second time was because I haven't used read() on the object in a while. Even the documentation you linked to doesn't make it clear to me that the stream gets flushed after the first read. It'd be more intuitive if the stream was copied when read instead of flushed.

I found a solution that worked for me. It involves writing a wrapper that supports seek(). I also read about smart_open in another blog, but I haven't tried it.

Was this page helpful?
0 / 5 - 0 ratings