Aws-sdk-net: Streaming upload of unknown size

Created on 19 Sep 2018  路  4Comments  路  Source: aws/aws-sdk-net

Expected Behavior

We need to upload large files, directly from HttpRequest to blob storage, without storing into filesystem.
(simple scenario is https://blogs.visoftinc.com/2013/03/26/streaming-large-files-asynchronously-using-net-4-5/ or https://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/)

Current Behavior

High level api ( https://docs.aws.amazon.com/AmazonS3/latest/dev/HLuploadFileDotNet.html ) expects seekable stream and stream length, so it's out of question.

Low level api is more useful, but minimum part size of 5 MB means we need to reserve 5 MB for each concurrent upload, which might be problem or expensive.

Possible Solution

Azure Blob api allows us to just pass a stream to copy, we just make sure each block is not longer than 100 MB using a wrapper stream, while minimal size is not problem, only size 0 (empty file) we need to handle ourselves:
https://docs.microsoft.com/en-us/dotnet/api/microsoft.windowsazure.storage.blob.cloudblockblob.putblockasync?view=azure-dotnet#Microsoft_WindowsAzure_Storage_Blob_CloudBlockBlob_PutBlockAsync_System_String_System_IO_Stream_System_String_System_Threading_CancellationToken_

Your Environment

  • Targeted .NET platform: >= .Net 4.5

Not sure if this is a feature request or we missed something in the docs.
We would like to support multiple storage backends.

Related: https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header

feature-request moduls3

Most helpful comment

Ah, I see. You want to write directly to the HTTP request buffer. I don't think that's supported in the current request/response pipeline. I'll mark this as a feature request for now.

All 4 comments

I don't understand what this ask is. Is this a feature request?

What do you mean by reserving 5MB? Even if the high level library supported uploading a file of unknown size, it would be holding the 5MB data in memory for each part until the upload is done. All that's changing is which component requires memory to complete the transfers. From the application/process point of view, it still needs a place to store the 5MB of data before it's no longer needed.

It would be nice if application could open "s3 request stream" (post body as output stream) and then just copyAsync to it ( https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.copytoasync?view=netframework-4.7.2 )

Example RestSharp code to upload file, there should be an async version of it somewhere: https://stackoverflow.com/questions/32876606/restsharp-addfile-using-stream

Example streaming response in webapi https://blog.stephencleary.com/2016/10/async-pushstreamcontent.html

Point is buffer can be much smaller than whole file part and async method should not block web server thread while waiting for more data.

Ah, I see. You want to write directly to the HTTP request buffer. I don't think that's supported in the current request/response pipeline. I'll mark this as a feature request for now.

Closing as we already have feature request for streaming data into S3 without a content-length like this one https://github.com/aws/aws-sdk-net/issues/1095.

Right now S3 requires content-length for putting an object or a part.

Was this page helpful?
0 / 5 - 0 ratings