Aws-sdk-java: TransferManager multipart upload from a FileInputStream instance fails with ResetException

Created on 9 May 2015  路  30Comments  路  Source: aws/aws-sdk-java

Multipart upload of a FileInputStream using the following code will fail with ResetException: Failed to reset the request input stream. I also tried, with no luck, to wrap the FileInputStream in a BufferedReader, which supports marking (confirmed by checking that BufferedInputStream.markSupported() indeed returns true).

object S3TransferExample {
// in main class
def main(args: Array[String]): Unit = {
    ...
    val file = new File("/mnt/2gbfile.zip")

    val in = new FileInputStream(file)) // new BufferedInputStream(new FileInputStream(file))) --> FYI, using buffered input stream will still result in the same error
    upload("mybucket", "mykey", in, file.length, "application/zip").waitForUploadResult
    ...
}

val awsCred = new BasicAWSCredentials("access_key", "secret_key")
val s3Client = new AmazonS3Client(awsCred)
val tx = new TransferManager(s3Client)

def upload(bucketName: String,  keyName: String,  inputStream: InputStream,  contentLength: Long,  contentType: String,  serverSideEncryption: Boolean = true,  storageClass: StorageClass = StorageClass.ReducedRedundancy ):Upload = {
  val metaData = new ObjectMetadata
  metaData.setContentType(contentType)
  metaData.setContentLength(contentLength)

  if(serverSideEncryption) {
    metaData.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION)
  }

  val putRequest = new PutObjectRequest(bucketName, keyName, inputStream, metaData)
  putRequest.setStorageClass(storageClass)
  putRequest.getRequestClientOptions.setReadLimit(100000)

  tx.upload(putRequest)

}
}

Here is the stack trace:

Unable to execute HTTP request: mybucket.s3.amazonaws.com failed to respond
org.apache.http.NoHttpResponseException: mybuckets3.amazonaws.com failed to respond
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) ~[httpcore-4.3.2.jar:4.3.2]
    at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) ~[httpcore-4.3.2.jar:4.3.2]
    at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) ~[httpcore-4.3.2.jar:4.3.2]
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66) ~[aws-java-sdk-core-1.9.13.jar:na]
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) ~[httpcore-4.3.2.jar:4.3.2]
    at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.3.4.jar:4.3.4]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) ~[httpclient-4.3.4.jar:4.3.4]
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:685) [aws-java-sdk-core-1.9.13.jar:na]
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460) [aws-java-sdk-core-1.9.13.jar:na]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295) [aws-java-sdk-core-1.9.13.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3710) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:2799) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:2784) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:259) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:193) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:125) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:129) [aws-java-sdk-s3-1.9.13.jar:na]
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:50) [aws-java-sdk-s3-1.9.13.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
com.amazonaws.ResetException: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
  at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:636)
  at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3710)
  at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:2799)
  at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:2784)
  at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:259)
  at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:193)
  at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:125)
  at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:129)
  at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:50)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Resetting to invalid mark
  at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
  at com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
  at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:103)
  at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:139)
  at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:103)
  at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:634) 

Most helpful comment

Just a quick summary of the issue and best practices:
Problem summary
When uploading objects to Amazon S3 using streams (either through S3 client or Transfer Manager), it is possible to run into network connectivity or timeout issues. The AWS Java SDK by default attempts to retry these failed transfers. The input stream is marked before the start of transfer and reset before retrying. The SDK recommends customers to use resettable streams (streams that support mark and reset operations). If the stream does not support mark and reset, then the SDK throws ResetException when there are any transient failures and retries are enabled.

Best Practices
1) The most reliable way to avoid ResetException is to provide data via File or FileInputStream which can be handled by Java SDK without being constrained by any mark-and-reset limit.
2) If the stream is not FileInputStream but supports mark/reset, you can set the mark limit using聽RequestClientOptions#setReadLimit. The default value is 128KB. Setting this value to one byte greater than the size of stream will reliably avoid ResetExceptions. For example, if the max expected size of stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes聽so that the mark and reset will always work for 100,000 bytes or less. Please be aware that this might cause some streams to buffer that number of bytes into memory.

All 30 comments

Hi @lolski,

The reason of the failure has something to do with the default buffer limit in a BufferedInputStream which you used to wrap the underlying file input stream. One way to fix this is to simply use:

val in = new FileInputStream(file)

and pass it to the request instead of the buffered input stream. The S3 Java Client is able to handle a FileInputStream without being constrained by any mark-and-reset limit.

However, in this case we recommend to use a simpler approach: you can directly specify the original file in the PutObjectRequest instead of specifying an input stream. The S3 Java Client will then figure out the optimal way to handle the file upload free of any mark-and-reset limit.

For completeness, suppose you had an input stream that is not associated with a file, it would still be NOT necessary to wrap it with a BufferedinputStream. Given an input stream in the request, the S3 Java Client will wrap it automatically as necessary. In such case, however, you would need to set the "read limit" (which is the maximum buffer size that could be consumed) as suggested in the error message:

com.amazonaws.ResetException: Failed to reset the request input stream;  
If the request involves an input stream, the maximum stream buffer size can be configured
via request.getRequestClientOptions().setReadLimit(int)

Hope this makes sense.

Regards,
Hanson

Hi @hansonchar,

I have confirmed that on SDK version 1.9.33, the exact same error happens even when I use a FileInputStream instead of a BufferedInputStream instance.

However, specifying a File instance works with no problem.

Here's the code excerpt:

object S3TransferExample {
// in main class
def main(args: Array[String]): Unit = {
    ...
    val file = new File("/mnt/2gbfile.zip")
    val in = new FileInputStream(file)
    upload("mybucket", "mykey", in, file.length, "application/zip").waitForUploadResult
    ...
}

val awsCred = new BasicAWSCredentials("access_key", "secret_key")
val s3Client = new AmazonS3Client(awsCred)
val tx = new TransferManager(s3Client)

def upload(bucketName: String,  keyName: String,  inputStream: InputStream,  contentLength: Long,  contentType: String,  serverSideEncryption: Boolean = true,  storageClass: StorageClass = StorageClass.ReducedRedundancy ):Upload = {
  val metaData = new ObjectMetadata
  metaData.setContentType(contentType)
  metaData.setContentLength(contentLength)

  if(serverSideEncryption) {
    metaData.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION)
  }

  val putRequest = new PutObjectRequest(bucketName, keyName, inputStream, metaData)
  putRequest.setStorageClass(storageClass)
  putRequest.getRequestClientOptions.setReadLimit(100000)

  tx.upload(putRequest)

}
}

Hi @lolski,

If you looked at the release note of 1.9.34, you will see there is a bug fix exactly on this related to FileInputStream. Please give that a try when you got a chance.

(But, of course, specifying a file is the recommended approach.)

Regards,
Hanson

On a side note, suppose you have a (non-file) input stream with a max expected size of 100,000 bytes, the read limit to set would need to be 1 extra byte more i.e. 100,001 so that the mark and reset will always work for 100,000 bytes or less.

Regards,
Hanson

@hansonchar Does that rule apply to file input stream too? I think it's good to add this info to the docs: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/RequestClientOptions.html#setReadLimit(int)

Nop. It should just work if you specified a FileInputStream without any additional configuration, assuming 1.9.34+.

Agree on the javadoc.

@hansonchar then I want to confirm that the problem still persist on 1.9.3.4. I will edit the title of this issue appropriately.

You should be able to reproduce it by allocating a large file e.g. fallocate -l 10G /mnt/10gbfile, and uploading it using transfer manager with the code above.

Also, sometimes the upload will succeed especially when the file is not that large, e.g. 1GB or 2GB. I've had 1 success out of trying to upload 2GB file after 4 tries. This might mean that the retry part of the code is the cause.

Hi @lolski,

This is because the FileInputStream got wrapped by TransferManager into a different type of stream for multi-part uploads before passing it to the low-level S3 client, and therefore the stream got treated as if it needed memory buffering. I think the fixes should be rather straightforward. Will look into this.

I just tested a fix and got a 10G file uploaded using TransferManager with FileInputStream as the input in the request. Will include the fix in the next release.

@hansonchar thanks

@hansonchar Do you know which version of the SDK you have released the fix in? I am still seeing this error and I am using AWS Java SDK version 1.10.20.

@hansonchar I'm also seeing this with AWS Java SDK 1.10.15. Particularly for large files > 60GB using just a normal InputStream, the transfer manager seems to be wrapping the stream into a mark supported stream which eventually fails with the same error.

@hansonchar I agree with the above posts about the error still recurring. I am fortunate that it's feasible for me to simply use the file-based method instead

I'm having success setting the multi-part size to the buffer size (this way the part can always be reset in case of connection failure):

      val uploader = new TransferManager(...)
      val request = new PutObjectRequest(...)

      // set the buffer size (ReadLimit) equal to the multipart upload size, allowing us to resend data if the connection breaks
      request.getRequestClientOptions.setReadLimit(TEN_MB)
      uploader.getConfiguration.setMultipartUploadThreshold(TEN_MB)

      val upload = uploader.upload(request)

We managed to get around the problem through implementing our own mark and resettable stream by wrapping a FileChannel:

public class SeekableByteChannelInputStream extends ChannelInputStream {
    public final SeekableByteChannel ch;
    public long markPos = -1;

    public SeekableByteChannelInputStream(final SeekableByteChannel channel) {
        super(channel);
        this.ch = channel;
    }

    @Override
    public long skip(final long n) throws IOException {
        final long position = Math.max(0, Math.min(ch.size(), ch.position() + n));
        final long skipped = Math.abs(position - ch.position());

        ch.position(position);

        return skipped;
    }


    @Override
    public synchronized void mark(final int readlimit) {
        try {
            markPos = ch.position();

        } catch (IOException e) {
            throw Throwables.propagate(e);
        }
    }

    @Override
    public synchronized void reset() throws IOException {
        if (markPos < 0)
            throw new IOException("Resetting to invalid mark");

        ch.position(markPos);
    }

    @Override
    public boolean markSupported() {
        return true;
    }
}

We passed this stream to the TransferManager final InputStream s = new SeekableByteChannelInputStream(FileChannel.open(targetFile)) and there have been no problems so far. Hope it's a feasible alternative to anyone else still suffering from the same problem.

I'm having this issue using TransferManager to transfer an s3Object retrieved from one accoun into another account. The code is like this:

    try (InputStream input = new BufferedInputStream(s3Object.getObjectContent())) {
      UploadResult result = archiver
          .upload(bucketName, archivePath, input, uploadObjectMetadata)
          .waitForUploadResult();

Not using the BufferedInputStream in this case seems to be incredibly slow. Maybe there is a better way to transfer an s3 object from one account to another?

@rdifalco Did you try @garretthall's suggestion? I have the same use case as you and it's working perfectly.

One small adjustment: I believe read limit needs to be set to the part size, not the multipart threshold. You can get this value via TransferManagerUtils.calculateOptimalPartSize, which should be the maximum number of bytes that'll be buffered for a given upload.

@spieden are you suggesting the following?

    archiver.getConfiguration().setMultipartUploadThreshold(TEN_MB);

And then to set the BufferedInputStream buffer size and the client options read Limit to the calculateOptimalPartSize result? Doesn't that create a chicken and egg issue since the calculate part size method needs the PutRequest which requires the BufferedInputStream that you need to size?

Now I'm starting to question the value of this. Is it better to have optimal part sizes instead of part sizes I feel comfortable having completely buffered? If there is a reset error then I just retry the entire operation myself instead of relying solely on the AWS SDK to retry it for me. What do you think @hansonchar?

What's the official/unofficial fix / implementation approach to avoid the ResetException when using the an InputStream (not FileInputStream or File)?

@kiiadi : What do you or your colleagues recommend?

We've been running into this issue even after putting @garretthall 's fix in place. Any ideas?

Just a quick summary of the issue and best practices:
Problem summary
When uploading objects to Amazon S3 using streams (either through S3 client or Transfer Manager), it is possible to run into network connectivity or timeout issues. The AWS Java SDK by default attempts to retry these failed transfers. The input stream is marked before the start of transfer and reset before retrying. The SDK recommends customers to use resettable streams (streams that support mark and reset operations). If the stream does not support mark and reset, then the SDK throws ResetException when there are any transient failures and retries are enabled.

Best Practices
1) The most reliable way to avoid ResetException is to provide data via File or FileInputStream which can be handled by Java SDK without being constrained by any mark-and-reset limit.
2) If the stream is not FileInputStream but supports mark/reset, you can set the mark limit using聽RequestClientOptions#setReadLimit. The default value is 128KB. Setting this value to one byte greater than the size of stream will reliably avoid ResetExceptions. For example, if the max expected size of stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes聽so that the mark and reset will always work for 100,000 bytes or less. Please be aware that this might cause some streams to buffer that number of bytes into memory.

I am not sure what will be ideal value for resetLimit(). The data files I like to upload to S3 is in the range of 8GB to 15GB. I have set initial partSize as 5GB. In this case, what will be ideal value for resetLimit()? For now, I have set 10MB as the readLimit. I like to get recommendations on what is the value that is ideal for my use case.
Example:
new UploadPartRequest().withBucketName(bucketName).withKey(bucketKey) .withUploadId(initResponse.getUploadId()).withPartNumber(i).withInputStream(streamUpload).withPartSize(partSize);
int readLimit = 10485760 ;
uploadRequest.getRequestClientOptions().setReadLimit(readLimit);
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());

If you are using stream to upload an object, SDK will do a single upload and can't upload in parts. So in that case, the read limit would be object size (8 - 15gb in your case) + 1. If no content length is specified, the http client might buffer entire stream into memory. So it is recommended to provide content length when uploading via stream.

Please note that when uploading from a stream the readLimit will result in buffering that much data into memory so it's recommended to set that conservatively. Uploading from a file is a more reliable and performant option as we can know the content length from the length of the file and reproduce the content as much as needed for retries.

Hi, I've faced with almost the same problem when use S3ObjectInputStream. https://stackoverflow.com/questions/46360321/unable-to-reset-stream-after-calculating-aws4-signature

Best Practices

The most reliable way to avoid ResetException is to provide data via File or FileInputStream which can be handled by Java SDK without being constrained by any mark-and-reset limit.

If the stream is not FileInputStream but supports mark/reset, you can set the mark limit using RequestClientOptions#setReadLimit. The default value is 128KB. Setting this value to one byte greater than the size of stream will reliably avoid ResetExceptions. For example, if the max expected size of stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes so that the mark and reset will always work for 100,000 bytes or less. Please be aware that this might cause some streams to buffer that number of bytes into memory.

I don't see how this is an acceptable workaround. There are many reasons why I wouldn't want to write data to a temporary file (disk usage, file permissions, security concerns), and obviously not all data has a known-in-advance size or fits into memory, so having to permit TransferManager to buffer the whole thing in-memory is also inadequate.

Why doesn't TransferManager simply buffer the batch size of data that it sends? Then retrying a part upload is trivial.

I'v investigated this issue, it was a long story.

The conclusion is: pass a system property to java by insert following options to java command line

-Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=YOUR_MAX_PUT_SIZE

See https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1668

This tells AmazonS3Client to set max appropriate unwindable buffer size.

Edit 20181102: the link should be setReadLimit
https://github.com/aws/aws-sdk-java/blob/856d27b5d4f374fbb6299a3504f109ef23c1ea3a/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1668

I'v investigated this issue, it was a long story.

The conclusion is: pass a system property to java by insert following options to java command line

-Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=YOUR_MAX_PUT_SIZE

See https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1668

This tells AmazonS3Client to set max appropriate unwindable buffer size.

Thanks for posting the explanation and the link! However your link is now incorrect. I believe the correct canonical link is: https://github.com/aws/aws-sdk-java/blob/856d27b5d4f374fbb6299a3504f109ef23c1ea3a/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1668

@thauk-copperleaf thank you, you are right.

Was this page helpful?
0 / 5 - 0 ratings