Aws-sdk-java: No retries on network timeouts S3 InputStream

Created on 26 Sep 2016 · 10Comments · Source: aws/aws-sdk-java

It appears that there are no retries attempted when there's a network timeout on the underlying HTTP connection while reading the InputStream from S3Object#getObjectContent. It should instead transparently reconnect (as per the retry policy) and continue from the last byte's position.

Stack trace

Caused by: java.net.SocketTimeoutException: Read timed out 
at java.net.SocketInputStream.socketRead0(Native Method) 
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
at java.net.SocketInputStream.read(SocketInputStream.java:170) 
at java.net.SocketInputStream.read(SocketInputStream.java:141) 
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) 
at sun.security.ssl.InputRecord.read(InputRecord.java:503) 
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) 
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) 
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139) 
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:200) 
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) 
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151) 
at java.security.DigestInputStream.read(DigestInputStream.java:161) 
at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155) 
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238) 
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) 
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) 
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
...

feature-request

Source

phraktle

👍3

Most helpful comment

This is probably not something we'd consider taking on until the next major version bump as it is a big departure from what we do today. The retry policy for all streaming operations do not apply while reading the content because we've passed control back to the caller already. Presumably we could retry transparently by capturing a reference to the client in a special input stream and on calls to read, catch the IO exception and make another ranged GET starting from the last successful byte.

The transfer manager utility has some more robust retry and resume behavior, would that meet your needs for now?

shorea on 26 Sep 2016

👍2

All 10 comments

The transfer manager utility has some more robust retry and resume behavior, would that meet your needs for now?

shorea on 26 Sep 2016

👍2

Hi @shorea,

TransferManager does not allow streaming and requires a temporary file, which is not desirable in our use case. Since there's already a S3ObjectInputStream wrapping the stream, it doesn't sound like a stretch to imagine that it should internally reconnect.

Regards,
Viktor

phraktle on 27 Sep 2016

Yeah I think it's definitely possible and makes a lot of sense to honor the retry policy even for streaming operations but I don't think we can add it to the SDK without a major version bump due to the performance implications.

shorea on 27 Sep 2016

Using final S3Object s3Object = s3Client.getObject(bucketName, keyName); I believe I got the same error as well.

Caused by: java.net.SocketTimeoutException: Read timed out
dataservice-app_1             |     at java.net.SocketInputStream.socketRead0(Native Method)
dataservice-app_1             |     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
dataservice-app_1             |     at java.net.SocketInputStream.read(SocketInputStream.java:170)
dataservice-app_1             |     at java.net.SocketInputStream.read(SocketInputStream.java:141)
dataservice-app_1             |     at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
dataservice-app_1             |     at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
dataservice-app_1             |     at sun.security.ssl.InputRecord.read(InputRecord.java:532)
dataservice-app_1             |     at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
dataservice-app_1             |     at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
dataservice-app_1             |     at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
dataservice-app_1             |     at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
dataservice-app_1             |     at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
dataservice-app_1             |     at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
dataservice-app_1             |     at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155)
dataservice-app_1             |     at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:147)
dataservice-app_1             |     at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
dataservice-app_1             |     at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)

Using aws-java-sdk-s3:1.11.18 here.

Introducing a new method that accepts a OutputStream would be so great, instead of File, as streaming is a much desired use-case -- TransferManger::download (GetObjectRequest, OutputStream): Download _or_ TransferManger::download (GetObjectRequest, InputStream): Download

Also, @shorea it'd be great if any retry examples, PR, existed before official rollout within SDK -- #893! I have all objects have stored using multi-part upload (5MB or greater partsize).

stevematyas on 5 Nov 2016

@phraktle : Did you come up with a work-around?

stevematyas on 5 Nov 2016

👍1

Is this lack of retries the cause of the error I have been getting very frequently while streaming data from S3 to an EC2 instance in a VPC? I really don't want to download these files (I don't want to deal with the disk at all -- and streaming seems like it ought to work). But the error rate when downloading files is increasing dramatically, and it's a big operational pain. The failure happens at random places in the files (when I retry, the same file will often fail again, but at a different place).

Stack trace:
com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=122569353; expectedLength=664918217; includeSkipped=true; in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
at com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:152) ~[file-dapi-importer.jar!/:0.0.1]
at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:110) ~[file-dapi-importer.jar!/:0.0.1]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[file-dapi-importer.jar!/:0.0.1]
at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155) ~[file-dapi-importer.jar!/:0.0.1]
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238) ~[?:1.8.0_66]
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) ~[?:1.8.0_66]
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) ~[?:1.8.0_66]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_66]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_66]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_66]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_66]
at java.io.BufferedReader.fill(BufferedReader.java:161) ~[?:1.8.0_66]
at java.io.BufferedReader.readLine(BufferedReader.java:324) ~[?:1.8.0_66]
at java.io.BufferedReader.readLine(BufferedReader.java:389) ~[?:1.8.0_66]

OrigamiMarie on 8 Dec 2016

Hi @OrigamiMarie, sorry to hear you're having issues. We do have https://github.com/aws/aws-sdk-java/issues/893 in our backlog, which is to allow downloading to a InputStream using TransferManager. We are actively looking at ways to support it and hope to deliver it soon!

dagnir on 25 Jan 2017

If anyone ever does add transparent retries to failures in input stream reads, can I, as a representative of the Hadoop team who maintain the S3A connector, have a way to turn this off? Because we do our own reconnect logic and think we've got it under control (now), and having something underneath trying to be helpful might be a regression. Happy to discuss what could be done here, including what exceptions should be treated as recoverable...

steveloughran on 31 Jul 2018

There's a similar problem when the underlying s3 client fails the getObject call in retryableS3DownloadTask.getS3ObjectStream(). The failed call is not retried and the whole parallel download fails.

https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/ServiceUtils.java#L397

@dagnir Is there any workaround other than catching the exceptions from the TransferManager and retrying the whole download?