This might be fixed by 472f5bfbce08038653b641b070e3ac09ae846313, but not sure. I also am seeing some errors where the response.contentLength() differs from the actual size again, so there might be a race condition again. This is using 2.1.4 + netty.
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:39)
at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.validatePutObjectChecksum(AsyncChecksumValidationInterceptor.java:105)
at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:91)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:120)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:133)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
at software.amazon.awssdk.core.internal.http.async.SyncResponseHandlerAdapter.lambda$prepare$0(SyncResponseHandlerAdapter.java:85)
Similar issue here, using 2.1.4. We're uploading files to buckets in multiple regions asynchronously. This happens after a few successful uploads.
java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:50)
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:166)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:118)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:103)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$1(MakeAsyncHttpRequestStage.java:136)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:39)
at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.validatePutObjectChecksum(AsyncChecksumValidationInterceptor.java:105)
at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:91)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:120)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:133)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
at software.amazon.awssdk.core.internal.http.async.SyncResponseHandlerAdapter.lambda$prepare$0(SyncResponseHandlerAdapter.java:85)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
... 1 common frames omitted
I'm seeing something similar with 2.2.0. I have a basic integration test using LocalStack that's producing this checksum error every time the test calls getObject asynchronously with ByteArrayAsyncResponseTransformer. As far as I can see from debugging, the checksum validation is running before the response content is fully read (the test data is a simple UUID but the exception is raised with only half the UUID in the transformer's underlying BAOS.
The failing test is in commit bfd3a767 of https://gitlab.com/gkrupa/s3web (S3BackendSpec).
Related to #965. Fixed via #966 and the fix will be included in the next release.
The fix has been released in 2.3.0. Closing the issue.
Using 2.3.1 and still have this.
[SdkClientException: Data read has a different checksum than expected. Was -1560488262, but expected 1701186097]
Same. If I update the BOM version in the test I linked above it still fails.
@ben-manes is this going to re-open? or is it fixed in the next release?
We also have the same problem. The workaround is pretty ugly and inefficient.
@harrylaou I switched back to v1 as I was fighting bugs daily with v2 for a few months. Since then I’ve had no problems. Unfortunately AWS didn’t dog food this sdk before releasing and I don’t have the time to continue beta testing. I’ll switch over later once it matures. You might feel similar. I’m not reopening but you can file a new issue if the AWS team doesn’t reopen this one.
Reopening, because it sounds like there are more cases where this can occur that haven't already been covered by previous fixes.
@ben-manes, @vicpara, do either of you have an easily reproducible test case for this?
We're unfortunately not able to reproduce this on our end. Is there some specific configuration you have on your object that may be causing this?
@millems We are using the library in scala. Ideally we would like to call the method you have defined as follows
CompletableFuture<ReturnT> getObject(Consumer<GetObjectRequest.Builder> getObjectRequest,
AsyncResponseTransformer<GetObjectResponse, ReturnT> asyncResponseTransformer)
in scala the code would be :
client.getObject(getObjectRequest(bucket, fileName), new ByteArrayAsyncResponseTransformer[GetObjectResponse]()).toTask
Also, please note that AsyncResponseTransformer.toBytes() wouldn't work because of limitations of calling java static methods in scala for classes that have generics. Syntax like AsyncResponseTransformer[GetObjectResponse,[ResponseBytes]].toBytes() or AsyncResponseTransformer.toBytes():GetObjectResponse doesn't compile.
Calling new ByteArrayAsyncResponseTransformer[GetObjectResponse]() should work if it was not for the https://github.com/aws/aws-sdk-java-v2/issues/953
Since this doesn't work we are using the following workaround :
val bucket = "bucket_one"
val fileName = "file1.txt"
val tempFile: File = File.createTempFile("temp-", fileName.flattenPath)
tempFile.deleteOnExit()
val localPath: Path = tempFile.toPath
client.getObject(getObjectRequest(bucket, fileName), localPath)
Still have this issue on "software.amazon.awssdk:s3:2.5.10" version:
Code example:
... this.client = S3Client.create();
client.getObject(
GetObjectRequest.builder().bucket(bucket).key(key).build(),
ResponseTransformer.toFile(download)
);
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected. Was 1229635159, but expected 0
Any suggestions on how to fix?
@dementiev
Please help answer a few questions to enable us to reproduce this issue:
Hi,
also getting this error for an async client only.
Using version software.amazon.awssdk:aws-sdk-java:2.5.15.
Error happens consistently using the following client (Kotlin code)
class S3CredentialsProvider : AwsCredentialsProvider {
override fun resolveCredentials(): AwsCredentials {
val awsCredentialsProcessBuilder = AwsSessionCredentials.create(
"<SECRET>",
"<SECRET>",
"<SECRET>")
return awsCredentialsProcessBuilder
}
}
val s3Client = S3AsyncClient.builder()
.credentialsProvider(S3CredentialsProvider())
.region(Region.EU_WEST_1)
.build()
Writing an object as follows:
val toByteArray = "test".toByteArray()
val asyncRequestBody = AsyncRequestBody.fromBytes(toByteArray)
s3Client.putObject(
PutObjectRequest.builder()
.bucket(config.bucket)
.key("test")
.build(),
asyncRequestBody).await()
Using Kotlin Coroutines extensions (await()) to transform Java CompletableFuture to Coroutine. Works for listings. I am also using the sessionToken. Seems to work without a sessionToken.
For non async client
S3Client.builder()
.region(Region.EU_WEST_1)
.credentialsProvider(S3CredentialsProvider())
.build()
it works.
Here is the exception:
Exception in thread "main" software.amazon.awssdk.core.exception.SdkClientException: Data read has a different checksum than expected.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:39)
at software.amazon.awssdk.services.s3.checksums.ChecksumsEnabledValidator.validatePutObjectChecksum(ChecksumsEnabledValidator.java:134)
at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:86)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:138)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:151)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:88)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:129)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$FullResponseContentPublisher$1.request(ResponseHandler.java:369)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onSubscribe(AsyncResponseHandler.java:108)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$FullResponseContentPublisher.subscribe(ResponseHandler.java:360)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.onStream(AsyncResponseHandler.java:71)
at software.amazon.awssdk.core.internal.http.async.AsyncAfterTransmissionInterceptorCallingResponseHandler.onStream(AsyncAfterTransmissionInterceptorCallingResponseHandler.java:86)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onStream(MakeAsyncHttpRequestStage.java:249)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:112)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:65)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at com.typesafe.netty.http.HttpStreamsHandler.channelRead(HttpStreamsHandler.java:129)
at com.typesafe.netty.http.HttpStreamsClientHandler.channelRead(HttpStreamsClientHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at software.amazon.awssdk.http.nio.netty.internal.FutureCancelHandler.channelRead0(FutureCancelHandler.java:42)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at java.lang.Thread.run(Thread.java:748)
Any ideas?
Thanks!
Tim
@tim-fdc Do you have any special bucket configuration, or special characters in the bucket name?
The following works for me (Java runtime):
AwsSessionCredentials credentials = AwsSessionCredentials.create("<>",
"<>",
"<>");
try (S3AsyncClient client = S3AsyncClient.builder()
.credentialsProvider(() -> credentials)
.region(Region.EU_WEST_1)
.build()) {
client.createBucket(r -> r.bucket("tmp-millem")).join();
client.putObject(r -> r.bucket("tmp-millem").key("test"), AsyncRequestBody.fromBytes("test".getBytes())).join();
System.out.println(client.getObject(r -> r.bucket("tmp-millem").key("test"), AsyncResponseTransformer.toBytes())
.join()
.asUtf8String());
client.deleteObject(r -> r.bucket("tmp-millem").key("test")).join();
client.deleteBucket(r -> r.bucket("tmp-millem")).join();
}
FWIW, we've encountered this issue too when calling getObject, which was due to S3AsyncClient.clientConfiguration having a duplicated set of default interceptors.
Apparently there can't be more than one AsyncChecksumValidationInterceptor in the chain, as it will remove the checksum part and break the second check, which is why all those exceptions were expecting 0.
In DefaultS3BaseClientBuilder, it calls finalizeServiceConfiguration when building the client:
ClasspathInterceptorChainFactory interceptorFactory = new ClasspathInterceptorChainFactory();
List<ExecutionInterceptor> interceptors = interceptorFactory
.getInterceptors("software/amazon/awssdk/services/s3/execution.interceptors");
interceptors = CollectionUtils.mergeLists(interceptors, config.option(SdkClientOption.EXECUTION_INTERCEPTORS));
return config.toBuilder().option(SdkClientOption.EXECUTION_INTERCEPTORS, interceptors).build();
ClasspathInterceptorChainFactory.getInterceptors uses its classloader's getResources to fetch a list of ExecutionInterceptor, and for some reason it returned duplicated set of interceptors.
This was verified in our side by calling
Collections.list(
new ClasspathInterceptorChainFactory().getClass().getClassLoader()
.getResources("software/amazon/awssdk/services/s3/execution.interceptors")
).forEach(System.out::println);
Which prints out two same lines.
Looks like it's a not-so-common bug, we had our servlet running as ROOT webapp in tomcat, renaming ROOT to something else fixed it.
However I would suggest changing the behavior of ClasspathInterceptorChainFactory.createExecutionInterceptorsFromClasspath, to check for duplication before returning. @millems
@huchengming A potential breakthrough! Thanks! This sounds like an easy fix to make.
I have the same error when I do putObject
software.amazon.awssdk.core.exception.SdkClientException: Unable to unmarshall response (Data read has a different checksum than expected.). Response Code: 200, Response Text: OK
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleSuccessResponse(HandleResponseStage.java:100)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.handleResponse(HandleResponseStage.java:70)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:58)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:41)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:64)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:113)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:86)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:57)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:240)
at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:96)
at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:120)
at software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:73)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:44)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:3053)
@Zebradil, are you seeing that error frequently? For a PutObject using the sync client that error is probably not related and actually a good thing. This feature is meant to prevent putting or getting objects that have been corrupted at some point in the transfer/storage process. If it's a one off, the object probably was corrupted in the transfer and then rejected correctly.
My application's code is in development state and I’m changing things here and there. I got it work as expected, but after I changed internal payload content (it is still a string converted to byte array), I started getting this error _always_.
Tomorrow I’ll try to revert changes and somehow debug this.
Sent from my iPhone
On 4. Jun 2019, at 18:19, Sam Fink notifications@github.com wrote:
@Zebradil, are you seeing that error frequently? For a PutObject using the sync client that error is probably not related and actually a good thing. This feature is meant to prevent putting or getting objects that have been corrupted at some point in the transfer/storage process. If it's a one off, the object probably was corrupted in the transfer and then rejected correctly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
After cleaning and rebuilding my project I'm not getting this error anymore. It seems like the reason was in some corrupted artifacts. Sorry for bothering.
I am also encountering this error but I don't even need to create an S3 client, let alone the async client. I can replicate it by running integration tests with a shaded jar which includes the S3 dependency. Because the shaded jar is on my classpath along with my m2 repository the resource is found twice.
I posted on Stack Overflow with a minimal example: https://stackoverflow.com/questions/56587711/maven-shade-plugin-causes-duplicate-jars-on-classpath-when-running-integration-t. Do you have any workarounds for the time being?
The latest version of the SDK should now prevent duplicate interceptors being created. @stuartleylandcole @huchengming, please let us know if this fixes the issue you were seeing.
~I've found that from my scala application, this happens 100% of the time when the target bucket is encrypted using KMS, but doesn't happen when the target bucket is unencrypted.~
Edit: Nevermind, I figured out that checksum validation isn't supported on KMS-enabled buckets, so I just had to tell the request about the server-side encryption. Unrelated problem.
We've seen one of these errors again, despite using one of the latest versions of the S3 SDK (2.9.20). Our setup is similar to the ones described above:
S3AsyncClient s3Client = S3AsyncClient.builder().build();
And a request using an async request body:
s3Client.putObject(
PutObjectRequest.builder()
.bucket("some-bucket")
.key("some-key")
.build(),
asyncRequestBody);
Encountering this with 2.10.24. I'm doing PutObject with async client, very similar to code posted by PyvesB and tim-fdc.
I'm able to reproduce this consistently by looping over the async put code 10 times with a 5MB file. The first few succeed, but then around 3-5 will fail with the "Data read has a different checksum than expected." exception. However, all 10 files appear to upload correctly without corruption.
I tried with a much smaller file (10KB), and was able to do 100 loops without encountering the issue.
A bit of my test code:
S3AsyncClient s3Client = S3AsyncClient.builder().region(config.getRegion()).build();
String s3KeyPrefix = "test/";
AtomicInteger success = new AtomicInteger();
AtomicInteger failure = new AtomicInteger();
// Load this however
byte[] imageBytes = ...;
// We detect this
String contentType = ...;
int puts = 10;
List<CompletableFuture<?>> futures = new ArrayList<>();
for(int i = 0; i < puts; i++) {
String s3Key = s3KeyPrefix + UUID.randomUUID();
logger.debug("{}: Start", s3Key);
PutObjectRequest request = PutObjectRequest.builder()
.bucket(config.getBucket()) // Bucket name has only letters and dashes
.key(s3Key)
.contentType(contentType)
.build();
futures.add(s3Client.putObject(request, AsyncRequestBody.fromBytes(imageBytes)).handle((r, t) -> {
if(t != null) {
failure.incrementAndGet();
logger.error("{}: Something went wrong", s3Key, t);
} else {
success.incrementAndGet();
logger.debug("{}: {}", s3Key, r);
}
return null;
}));
}
logger.debug("waiting");
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
logger.debug("success={},failure={}", success, failure);
logger.debug("done");
Log output attached. Despite the exceptions, all files do upload successfully and do not appear to be corrupted (my test files are images, and I'm able to download and view them).
FWIW, once this exception occurs, all subsequent puts in the loop fail with the same message.
Same problem here using 2.10.26. Same observations as @lukeway in that the integrity of files in the S3 bucket appears fine.
I should note that I am running Java 11 using AdoptOpenJDK 11.0.5+10.
Thanks for the updated reports. We'll look back into this issue as soon as we can.
We're getting hit by this at Datadog as well.
Thanks to @mar-kolya we found one cause of this for PutObject. If the upload request is retried by the SDK, the second attempt will falsely detect an invalid checksum, even though the request actually succeeded. This is because it was calculating the checksum twice, and comparing against the service which only calculated it once.
This would also explain why the reports are that the file are okay - they are.
I've continued @mar-kolya's work in https://github.com/aws/aws-sdk-java-v2/pull/1550 by adding regression tests to make sure that we're always resetting the MD5 calculation on retries: https://github.com/aws/aws-sdk-java-v2/pull/1552
Hopefully we can get this change out in the next few days.
We've just updated to the newly released version of the SDK (2.10.39). We'll let you know if we see any more of these checksum failures.
@PyvesB Awesome, thanks! There's one more change in the pipe, but it only applies to buckets with server-side encryption enabled, and no server-side encryption parameters in the request.
If you don't have buckets with server-side encryption enabled, you should theoretically not see any errors, unless there's more edge cases lurking.
We've just updated to the newly released version of the SDK (2.10.39). We'll let you know if we see any more of these checksum failures.
Using 2.10.39 and seeing
Exception: Unable to unmarshall response (Data read has a different checksum than expected. Was 0x61dbb5e075a0623141a1598ed637a3fb, but expected 0x8c390228c2e2bfe6efc2de5db433bf17). Response Code: 200, Response Text: OK
software.amazon.awssdk.core.exception.SdkClientException: Unable to unmarshall response (Data read has a different checksum than expected. Was 0x61dbb5e075a0623141a1598ed637a3fb, but expected 0x8c390228c2e2bfe6efc2de5db433bf17). Response Code: 200, Response Text: OK
Sorry for the long delay, finally able to come back to this. I just tested with 2.10.55 and do not experience the checksum issue. I had a couple timeouts which resulted in the file failing to upload but I'm unable to reproduce that and it's probably not related. Thanks!
I'm experiencing this issue in 2.11.14 when using putObject of the S3AsyncClient client. Repeating the same test 30 times results in ~3-5 successful executions and the rest fail.
Client creation:
val httpClient: software.amazon.awssdk.http.async.SdkAsyncHttpClient = software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient.builder()
.writeTimeout(Duration.ZERO)
.maxConcurrency(64)
.build()
val serviceConfiguration: S3Configuration = S3Configuration.builder()
.checksumValidationEnabled(true)
.chunkedEncodingEnabled(true)
.build()
var b: S3AsyncClientBuilder = S3AsyncClient.builder().httpClient(httpClient)
.region(software.amazon.awssdk.regions.Region.of(config.region.get()))
.credentialsProvider(credentialsProvider)
.serviceConfiguration(serviceConfiguration)
if (config.endpoint.isPresent) {
b = b.endpointOverride(config.endpoint.get())
}
val s3Client = b.build()
Calling side:
Publisher<ByteBuffer> content = ...
s3client
.putObject(PutObjectRequest.builder()
.bucket(s3config.bucket.get())
.key("test")
.contentLength(inputFile.length())
.contentType(MediaType.APPLICATION_OCTET_STREAM)
.build(),
AsyncRequestBody.fromPublisher(content))
When replacing AsyncRequestBody.fromPublisher by AsyncRequestBody.fromFile the upload works properly, although it creates a Publisher internally as well. An interesting finding is that when I disable the checksum validation in the S3 client, upload the file and afterwards perform a binary compare of the original file and the (corrupted) uploaded one I can say:
By having a closer look into the trace output of netty I discovered that each uploaded chunk (ByteBuffer) by its own is valid. However, the ByteBuffers emitted by the Publisher are sent out of order. Maybe there's some race condition ongoing in the way the Publisher is consumed.
Not sure whether I'm fully correct but I think I spotted the race condition.
I suspect software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerSubscriber.
onNext will be initially called by some thread, perform an asynchronous write operation and set-up a callback to show additional demand if any.
@Override
public void onNext(T t) {
// Publish straight to the context.
Validate.notNull(t, "Event must not be null.");
lastWriteFuture = ctx.writeAndFlush(t);
lastWriteFuture.addListener(new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws Exception {
outstandingDemand--;
maybeRequestMore(); // <--- causes concurrency in the Subscription
}
});
}
The callback itself is executed by a netty IO thread asynchronously and concurrent to the previous thread (who might not yet be finished with emitting previously requested and queued items). However through maybeRequestMore it's calling the subscription again and this time concurrently!
private void maybeRequestMore() {
if (outstandingDemand <= demandLowWatermark && ctx.channel().isWritable()) {
long toRequest = demandHighWatermark - outstandingDemand;
outstandingDemand = demandHighWatermark;
subscription.request(toRequest);
}
}
Now according to the spec the Subscription is allowed to call onNext within the request method which will now mean that we got 2 threads that invoke onNext on the Subscriber concurrently.
However, if you use the AsyncRequestBody.fromFile publisher you will notice that this is not an issue since there is an artificial synchronization mechanism integrated in the returned FileSubscription:
public void request(long n) {
...
synchronized (this) { // <------------
if (!writeInProgress) {
writeInProgress = true;
readData();
}
}
...
}
private void readData() {
...
attachment.flip();
position += attachment.remaining();
signalOnNext(attachment);
...
}
private void signalOnNext(ByteBuffer bb) {
synchronized (this) { // <-------
if (!done) {
subscriber.onNext(bb);
}
}
}
After debugging even further I found a workaround to avoid that race condition but unfortunately the required API is not publicly exposed. By overriding the demandLowWatermark in the HandlerSubscriber with 0 I can enforce (at least in the debugger) that Subscription::request is not called concurrently to Subscriber::onNext. However, the constructor parameter is not exposed as configuration parameter.
Can we have this parameter exposed via some API? Maybe as part of the software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient.DefaultBuilder?
I've created the following reproducer:
https://github.com/marc-christian-schulze/kotlinx-reproducer-2109
It contains 3 test cases. 2 of them do pass and show that functionally speaking the code seems to be correct. The third test case however, shows the race-condition.
During the discussion with Roman Elizarov in https://github.com/Kotlin/kotlinx.coroutines/issues/2109 I think we came across the root cause for the messed up sequence of the data packets. It seems to be related to a multi-threading issue that appears when the Publisher invokes onNext sometimes in the netty event loop thread (exposed by the callback on writeAndFlush) and sometimes using a different thread.
I suspect this is causing the race condition. Instead of asynchronously executing the task it's required to do it sequentially.
I'm able to reproduce it when the content-length is too short:
package software.amazon.awssdk.services.s3;
import io.reactivex.Flowable;
import java.nio.ByteBuffer;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
import org.reactivestreams.Publisher;
import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.http.async.SdkAsyncHttpClient;
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
public class Repro {
public static void main(String... args) {
SdkAsyncHttpClient httpClient = NettyNioAsyncHttpClient.builder()
.writeTimeout(Duration.ZERO)
.maxConcurrency(64)
.build();
S3Configuration serviceConfiguration = S3Configuration.builder()
.checksumValidationEnabled(true)
.chunkedEncodingEnabled(true)
.build();
S3AsyncClient client = S3AsyncClient.builder()
.httpClient(httpClient)
.serviceConfiguration(serviceConfiguration)
.build();
List<ByteBuffer> bytes = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
byte[] byteArray = new byte[1000];
ThreadLocalRandom.current().nextBytes(byteArray);
bytes.add(ByteBuffer.wrap(byteArray));
}
Publisher<ByteBuffer> content = Flowable.fromIterable(bytes);
for (int i = 0; i < 100; i++) {
System.out.print(i);
client.putObject(PutObjectRequest.builder()
.bucket("millem-test-bucket")
.key("test")
.contentLength(3L)
.build(),
AsyncRequestBody.fromPublisher(content))
.join();
}
}
}
But I can't reproduce it when the content lengths are correct:
package software.amazon.awssdk.services.s3;
import io.reactivex.Flowable;
import java.nio.ByteBuffer;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;
import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.http.async.SdkAsyncHttpClient;
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
public class Repro {
public static void main(String... args) {
SdkAsyncHttpClient httpClient = NettyNioAsyncHttpClient.builder()
.writeTimeout(Duration.ZERO)
.maxConcurrency(64)
.build();
S3Configuration serviceConfiguration = S3Configuration.builder()
.checksumValidationEnabled(true)
.chunkedEncodingEnabled(true)
.build();
S3AsyncClient client = S3AsyncClient.builder()
.httpClient(httpClient)
.serviceConfiguration(serviceConfiguration)
.build();
Random random = new Random();
for (int testCaseNumber = 0; testCaseNumber < 10000; testCaseNumber++) {
TestCase testCase = new TestCase();
for (int i = 0; i < random.nextInt(200); ++i) {
byte[] byteArray = new byte[random.nextInt(200)];
ThreadLocalRandom.current().nextBytes(byteArray);
testCase.byteBuffers.add(ByteBuffer.wrap(byteArray));
testCase.length += byteArray.length;
}
System.out.println("Case length: " + testCase.length);
client.putObject(PutObjectRequest.builder()
.bucket("millem-test-bucket")
.key("test")
.contentLength(testCase.length)
.build(),
AsyncRequestBody.fromPublisher(Flowable.fromIterable(testCase.byteBuffers)))
.join();
}
}
private static class TestCase {
private long length = 0;
private final List<ByteBuffer> byteBuffers = new ArrayList<>();
}
}
@marc-christian-schulze Would you be able to create a self-contained repro case for us to use? I'm only able to reproduce the issue when the content length is too short (too long causes a deadlock). I can work on fixing that, but it doesn't explain your issue.
@millems, I've provided a reproducer here:
https://github.com/marc-christian-schulze/kotlinx-reproducer-2109
It shows that the race condition appears when the onNext method is called from a netty IO thread and a non-netty IO thread. Your example does not redprocue the race condition since it is using Flowable.fromIterable which does not use other threads than the calling one to deliver the items.
I've spent quiet some time together with the colleagues from the Kotlin project to narrow down the issue and finally ended up at OrderedWriteChannelHandlerContext::doInOrder
private void doInOrder(Runnable task) {
if (!channel().eventLoop().inEventLoop()) {
task.run();
} else {
// If we're in the event loop, queue a task to perform the write, so that it occurs after writes that were scheduled
// off of the event loop.
channel().eventLoop().execute(task);
}
}
There the actual "magic" happens depending on which thread is doing the ctx.writeAndFlush invocation. Once threads (netty IO and non-netty IO) start to invoke this doInOrder with fast iterations the actual network packets will be enqueued in random order since the channel's event loop is a thread pool that does not guarantee FIFO execution of scheduled tasks (otherwise the whole concept of the pool would be pointless).
@marc-christian-schulze OrderedWriteChannelHandlerContext::doInOrder was written with an assumption that the task to be executed will always end up executing the task via the event loop. When that assumption holds, the class behaves correctly.
I can't determine where this assumption is breaking down without a repro case, and I can't fix the issue without determining where that assumption is breaking down.
I'll give your repro case a try and see where that assumption is wrong. Thanks!
As an update from today's investigation, I can reproduce the exception using @marc-christian-schulze's code, but cannot attribute it to the OrderedWriteChannelHandlerContext. Even after removing the OrderedWriteChannelHandlerContext, the test still fails.
I don't see anything obviously wrong with the publisher to take some of the blame off the SDK, but that's really difficult to tell at a glance. It might be the publisher and it might be the SDK. More research is still needed... That said, it does look like the publisher is working correctly from a naive perspective (onNext is never invoked in parallel, it's being invoked in the correct order, etc.).
The working theory should be that it's the SDK until proven otherwise...
As a last update for today, I think the issue is that the OrderedWriteChannelHandlerContext isn't always being attached (e.g. on the second request using the same channel). The bug we're encountering is the bug that the OrderedWriteChannelHandlerContext is intended to fix. It's just not always being applied. Tomorrow, I'll see if I can fix the OrderedWriteChannelHandlerContext to always be attached, which will ideally fix the issue.
Thanks to @marc-christian-schulze for the repro case, which I was able to fix with https://github.com/aws/aws-sdk-java-v2/pull/1967. It looks like netty-reactive-streams can attach the same handler to a connection twice, so the OrderedWriteChannelHandlerContext wasn't always used for all streaming writes. This means that the race condition it is meant to fix could still occur sometimes. The fix ensures that the order-write fixing is always applied.
Hi. Unfortunately, I am still able to reproduce the issue on the SDK version 2.15.11. I prepared a demo project to show the issue.
The project has 3 tests. First to upload ~100kB array with checksum validation. This is the only green test in the scope.
Both other tests upload ~1mB array with and without checksum validation. In the test with checksum validation, I get mentioned "Data read has a different checksum than expected" error. Without checksum validation bytes are uploaded to s3 but they are different from the original byte array.
Looks like the packets still be reordered as @marc-christian-schulze mentioned.
Most helpful comment
FWIW, we've encountered this issue too when calling
getObject, which was due toS3AsyncClient.clientConfigurationhaving a duplicated set of default interceptors.Apparently there can't be more than one
AsyncChecksumValidationInterceptorin the chain, as it will remove the checksum part and break the second check, which is why all those exceptions were expecting 0.In
DefaultS3BaseClientBuilder, it callsfinalizeServiceConfigurationwhen building the client:ClasspathInterceptorChainFactory.getInterceptorsuses its classloader'sgetResourcesto fetch a list ofExecutionInterceptor, and for some reason it returned duplicated set of interceptors.This was verified in our side by calling
Which prints out two same lines.
Looks like it's a not-so-common bug, we had our servlet running as
ROOTwebapp in tomcat, renamingROOTto something else fixed it.However I would suggest changing the behavior of
ClasspathInterceptorChainFactory.createExecutionInterceptorsFromClasspath, to check for duplication before returning. @millems