Aws-sdk-net: TransferUtil.UploadAsync does not work with unseekable streams or zero length streams

Created on 17 Jun 2017 · 24Comments · Source: aws/aws-sdk-net

Expected Behavior

Things should behave, I should be able to pass through a stream that has no length / zero length and at least get decent exceptions, preferably it should be handled for the user.
Any other scenario requires us to buffer the full upload in a stream that can do it all (seeking, accurate report of length etc) before we can give the stream to the s3 sdk, not nice for using it in a webserver and feels like we have to babysit the aws sdk so it behaves...

Current Behavior

Request terminates with NotSupported exception as S3 tries to access the length somewhere without catching the exception. The stacktrace is posted under Stacktrace for length NotSupported exception:

Steps to Reproduce (for bugs)

Create aspnet project, add a post endpoint for a simple upload transfer (just binary, no form data),

What also helps, as a non seekable stream is not always practical, is to add this line, this is a helper that spools the request body to disk once it reaches a threshold

HttpContext.Request.EnableRewind(3145728); // 3 MB max in mem before spooling to disk

Problems is that with that method the S3 client hangs and finally responds/times-out with the exception seen at the bottom Stacktrace for length 0 case:

Your Environment

AWSSDK.Core version used: 3.3.1.2
Targeted .NET platform: netcoreapp 1.1
Output of dotnet --info:

.NET Command Line Tools (1.0.4)

Product Information:
 Version:            1.0.4
 Commit SHA-1 hash:  af1e6684fd

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.12
 OS Platform: Darwin
 RID:         osx.10.12-x64
 Base Path:   /usr/local/share/dotnet/sdk/1.0.4

Context:

Most streams don't allow you to set length, and both of these streams can't do length tracking in a reliable way as they sometimes don't know it yet (think chunked requests)

FrameRequestStream https://github.com/aspnet/KestrelHttpServer/blob/dev/src/Microsoft.AspNetCore.Server.Kestrel.Core/Internal/Http/FrameRequestStream.cs
FileBufferingReadStream https://github.com/aspnet/HttpAbstractions/blob/master/src/Microsoft.AspNetCore.WebUtilities/FileBufferingReadStream.cs

these streams are no exception to that, that is why you probably put quite a few consumers of your sdk in a tough spot by trying to read out a ContentLength for your own S3 request from the stream. That should be re-evaluated, either by allowing us to pass in a ContentLength ourselves somewhere or doing something like a chunked upload so you don't need to set a Content-Length header and thus have no need for a ContentLength from the stream.

Also apparently something is completely blowing up if the stream length that was passed in is 0 as I could easily achieve, that should be something to check out irrespective of your decision or the outcome of this issue.

Stacktrace for length NotSupported exception:

      An unhandled exception has occurred: Specified method is not supported.
System.NotSupportedException: Specified method is not supported.
   at Microsoft.AspNetCore.Server.Kestrel.Internal.Http.FrameRequestStream.get_Length()
   at Amazon.S3.Transfer.TransferUtilityUploadRequest.get_ContentLength()
   at Amazon.S3.Transfer.TransferUtility.IsMultipartUpload(TransferUtilityUploadRequest request)
   at Amazon.S3.Transfer.TransferUtility.GetUploadCommand(TransferUtilityUploadRequest request, SemaphoreSlim asyncThrottler)
   at Amazon.S3.Transfer.TransferUtility.UploadAsync(Stream stream, String bucketName, String key, CancellationToken cancellationToken)
... our code ...

Stacktrace for length 0 case (it took many seconds before it responds with this error):

      An unhandled exception has occurred: The provided 'x-amz-content-sha256' header does not match what was computed.
Amazon.S3.AmazonS3Exception: The provided 'x-amz-content-sha256' header does not match what was computed. ---> Amazon.Runtime.Internal.HttpErrorResponseException: Exception of type 'Amazon.Runtime.Internal.HttpErrorResponseException' was thrown.
   at Amazon.Runtime.HttpWebRequestMessage.<GetResponseAsync>d__20.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler\_mobile\HttpRequestMessageFactory.cs:line 404
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.HttpHandler`1.<InvokeAsync>d__9`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\HttpHandler\HttpHandler.cs:line 175
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.RedirectHandler.<InvokeAsync>d__1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.Unmarshaller.<InvokeAsync>d__3`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.S3.Internal.AmazonS3ResponseHandler.<InvokeAsync>d__1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.ErrorHandler.<InvokeAsync>d__5`1.MoveNext()
   --- End of inner exception stack trace ---
   at Amazon.Runtime.Internal.HttpErrorResponseExceptionHandler.HandleException(IExecutionContext executionContext, HttpErrorResponseException exception) in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\HttpErrorResponseExceptionHandler.cs:line 60
   at Amazon.Runtime.Internal.ErrorHandler.ProcessException(IExecutionContext executionContext, Exception exception) in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\ErrorHandler.cs:line 212
   at Amazon.Runtime.Internal.ErrorHandler.<InvokeAsync>d__5`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\ErrorHandler\ErrorHandler.cs:line 104
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CallbackHandler.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Amazon.Runtime.Internal.RetryHandler.<InvokeAsync>d__10`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\RetryHandler\RetryHandler.cs:line 131
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CallbackHandler.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CallbackHandler.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.S3.Internal.AmazonS3ExceptionHandler.<InvokeAsync>d__1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.<InvokeAsync>d__5`1.MoveNext() in E:\JenkinsWorkspaces\v3-stage-release\AWSDotNetPublic\sdk\src\Core\Amazon.Runtime\Pipeline\Handlers\ErrorCallbackHandler.cs:line 58
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.MetricsHandler.<InvokeAsync>d__1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.S3.Transfer.Internal.SimpleUploadCommand.<ExecuteAsync>d__10.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
... our code ...

feature-request

Source

NinoFloris

👍7 ❤1

Most helpful comment

I hope this issue is still in the backlog, and that there are still intentions of resolving it. We hit the same problem.

jarroda on 29 Oct 2018

👍15

All 24 comments

Is there a reason why you are not making a direct put object request? Transfer utility is for concurrent multipart upload. Also, if you could provide sample code, that might help us understand the use case.

sstevenkang on 17 Jun 2017

Yes because it is in our filesystem abstraction layer which also deals with bigger files, sometimes from disk. And so Transfer utility is (conceptually) a good fit for us.

I'll see if I can whip up something with aspnet core abstractions so it's small and clear

NinoFloris on 17 Jun 2017

Here you go https://github.com/NinoFloris/awsrepro675

EDIT:
Just tested the FileBufferingReadStream and a non seekable stream with PutObjectRequest as well, same exact errors

NinoFloris on 17 Jun 2017

👍1

@sstevenkang as you can read those errors are not only with Transfer utility but also present for PutObjectRequest, I hope this warrants a different label than "Question"

NinoFloris on 26 Jun 2017

I understand your use case but S3 has always had the requirement that you must set the content-length whenever you do a PUT object. Here is the S3 docs which point out Content-Length being required

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

We also compute the SHA256 of the object which is passed as a header for data integrity.

This team and repo doesn't control how the S3 API works. Just how it is interfaced with .NET. If you want dive deeper into S3's design I suggest you reach out on the S3 forums.

https://forums.aws.amazon.com/forum.jspa?forumID=24

normj on 26 Jun 2017

The problem is not the fact that we don't know the Content-Length but that the sdk assumes the stream actually has and knows that length. This assumption is wrong and the behavior unidiomatic.
There should be a way to give a content length to the sdk as an argument separate from the stream. Currently as far as I could tell most of this is internal and just pokes the stream for its length.

NinoFloris on 3 Jul 2017

👍1

I am a little confused by the comment and also the comment in sample code:

There should be a way to give a content length to the sdk as an argument separate from the stream.
Every request stream type out there will not be able to seek or give length

If the content length is known, why is the stream not seekable? That's what I am having hard time following. I can understand the argument that the S3 operations should support unseekable stream--which we don't due to performance reasons, but to say that it's more idiomatic for a function to take an unseekable stream and the size of the stream as parameters... seems odd.

sstevenkang on 3 Jul 2017

Because of the same reason ;) kestrel (.net core web server) for instance doesn't do buffering for performance reasons. And the buffering stream that I linked to also doesn't report it due to unreliable internals (it only starts buffering after the first read for instance) It is just something that is hard to rely on, however we always know the file size or content length from our incoming request to pass a valid content-length to the sdk.

We just don't always have control over the type of input streams. If you think it shouldn't be in the sdk we'll just have to pass some wrapper stream that takes a stream and a length, an unnecessary allocation but so be it.

Besides all this, a content-length != a stream length, these two could have completely separate meanings and should be able to be set separately from each other.

NinoFloris on 4 Jul 2017

👍5

We suffer the same problem - particularly if we are dealing with encrypted or compressed streams. We can't read the size from the stream, or seek. We have a kinda workaround which is if we use the simple PUT object for these types AND are told up front how big the file is we can support them - but obviously the PUT request has a max size limit of 5GB which isn't ideal. A solution for multi-part uploads would help us significantly.

Note, we are also behind an abstraction, and S3 is only ONE of the concrete implementations.

tapmantwo on 14 Jul 2017

👍3

I've added this as a feature request in our backlog. We'll report back once we have more news regarding this work. I appreciate the the detailed report, repro, and the discussion.

sstevenkang on 25 Jul 2017

❤3

I hope this issue is still in the backlog, and that there are still intentions of resolving it. We hit the same problem.

jarroda on 29 Oct 2018

👍15

Has there been any traction on this? Its been almost 2 years.

tburnett80 on 6 May 2019

👍2

We gave up waiting and now use Minio https://github.com/minio/minio which supports not only specifying a length separately in the upload call, but also supports length -1 meaning "stream it out, I don't know how big it'll be".

Aspnet or IIS willl protect you with a maximum body size limit for gigantic uploads anyway. This is perfect for APIs that want to proxy uploads after validation and upload destination is retrieved from config.

NinoFloris on 6 May 2019

👍2

We have a similar use case where we get files from a 3rd party through http that we want to upload to S3.

If you think it shouldn't be in the sdk we'll just have to pass some wrapper stream that takes a stream and a length, an unnecessary allocation but so be it.

I'm testing with a wrapper stream now and it seems to work.
I'm assuming the Content-Length from the http-server is correct.
Any idea if this will crash and burn at some point?

using (var response = await httpClient.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead))
{
    var length = response.Content.Headers.ContentLength;
    using (var stream = await response.Content.ReadAsStreamAsync())
    using (var wrapper = new StreamingStreamWrapper(stream, length.Value))
    using (var fileTransferUtility = new TransferUtility(s3Client))
    {
        await fileTransferUtility.UploadAsync(wrapper, bucket, key);
    }
}

StreamingStreamWrapper:
```
internal class StreamingStreamWrapper : Stream
{
private Stream stream;
private long length;
private long readCount = 0;

    public StreamingStreamWrapper(Stream stream, long length)
    {
        this.stream = stream;
        this.length = length;
    }

    public override bool CanRead => true;

    public override bool CanSeek => true;

    public override long Length => length;

    public override long Position { get => readCount; set => throw new System.NotImplementedException(); }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var bytes = stream.Read(buffer, offset, count);            

        readCount += bytes;
        return bytes;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return 0;
    }

````

TommyN on 20 Jun 2019

@TommyN I came up with a similar wrapper stream approach. In my case it worked fine. Although when I checked my app memory consumption for big files (more than 100 MB) it turned out that TransferUtility buffers the whole input stream into memory!!!

After some search I found this page and tried your approach. The result is the same. Works but does full buffering internally. So we could just use full read to MemoryStream and no wrappers.

p-selivanov on 31 Jul 2019

BTW our use case is a follows:

It is ASP.NET Core API
API accepts a request stream. Images, media files (sometimes big)
We do some internal logic with data from headers/cookies and figure out the target bucket name and key. We don't read the body and avoid full buffering.
After that we forward the request body to S3. And would be happy if S3 SDK could do the upload also without full buffering.

p-selivanov on 31 Jul 2019

👍1

@p-selivanov I am doing the same thing, using Azure blob storage API's I could just pass the stream down the line without having to first read it to get the length first.

I switched to Minio because it allows the -1 to 'stream out' the content.
Its just surprising given how large of a market share AWS has that they cant handle non-seek-able streams in a graceful manner. They continue making Azure a better choice for dotnet developers.

tburnett80 on 31 Jul 2019

I'm running into this very same problem, and it's disappointing that Amazon hasn't given a solution for this yet.

My use case is that I'm taking zip files from one S3 bucket, and decompressing them into another S3 bucket via a lambda function. However, this won't work, since the stream(s) from the zip archive are not seekable.

aron-truvian on 10 Oct 2019

👍1

Same problem but in my case I need to calculate the SHA512 on the fly so I need to wrap the FileStream into CryptoStream which is not seekable of course (as it calculating the SHA512 in the fly)

Svisstack on 7 Jan 2020

For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:

https://github.com/aws-samples/cloudmosaic-dotnet/blob/reinvent-2019/Application/Clients/CloudMosaic.BlazorFrontend/FileUploader.cs#L37

I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.

normj on 7 Jan 2020

@normj nice! How is that even a question?

NinoFloris on 21 Feb 2020

What's the status here? it's been another 10 months. This is an absolute must have.

S3 is the service to store large files, right?

Why does this API not allow a simple unseekable stream? I want to pass incoming request body (of my API) as a stream directly into S3.

Alternatively, I wouldn't mind if you exposed an API where you open a writeable stream for me that I copy data into.
This would be even cleaner. Maybe like this:

            using var stream = await s3.OpenPutObjectAsync(new OpenPutObjectRequest
            {
                BucketName = bucketName,
                Key = key,
                ContentType = contentType,
                ContentLength = contentLength,
            });

           // Let me write whatever I want.

At this point in time you're forcing me to to either:

stream to FS, then upload to S3
or
write a stream wrapper that makes this awkward API happy.

oliverjanik on 29 Oct 2020

👍1

For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:

https://github.com/aws-samples/cloudmosaic-dotnet/blob/reinvent-2019/Application/Clients/CloudMosaic.BlazorFrontend/FileUploader.cs#L37

I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.

@normj this implementation works well; I think you should put this into an SDK

mrogunlana on 12 Jan 2021

We are using similar algorithm, I agree that this should be in the sdk.

Sent from my iPhone

On 12 Jan 2021, at 21:35, Diran Ogunlana notifications@github.com wrote:

For my .NET re:Invent presentation this year I needed to upload large files uploaded from the browser to S3. What I did was use mult part uploads and buffer the file into memory till I had a part size and then upload that. The code can be found here:

https://github.com/aws-samples/cloudmosaic-dotnet/blob/reinvent-2019/Application/Clients/CloudMosaic.BlazorFrontend/FileUploader.cs#L37

I have been debating about pulling that code out of the sample and put it in some form into the SDK. Would appreciate hearing the community thoughts on that idea.

@normj this implementation works well; I think you should put this into an SDK

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.