Describe the bug
When executing a stored procedure ( e.g. bulk-delete.js) via AsyncDocumentClient and the request runs into rate limiting then the retry mechanism throws an error.
Here is the log output: (from SDK 3.3.1)
11:19:35.814 [reactor-http-epoll-4] ERROR c.a.d.c.internal.ClientRetryPolicy - locationEndpoint is null because ClientRetryPolicy::onBeforeRequest(.) is not invoked, probably request creation failed due to invalid options, serialization setting, etc.
11:19:35.817 [reactor-http-epoll-4] ERROR c.a.d.c.i.RenameCollectionAwareClientRetryPolicy - onBeforeSendRequest is not invoked, encountered failure due to request being null
com.azure.data.cosmos.CosmosClientException: Message: {"Errors":["Request rate is large"]}
...
To Reproduce
Execute bulk-delete.js stored procedure on a large enough dataset so that the execution runs into execution bounds (rate limiting).
In my case there were 500 documents to delete and the partition was configured with 400 RU/s. In the first execution 345 documents were successfully deleted. The described error occurred after executing the stored procedure a second time.
Expected behavior
The retry is performed automatically and the remaining documents are getting deleted. Basically the same behaviour as for standard queries using the SDK.
In version com.microsoft.azure:azure-cosmosdb:2.4.4 everything was working correctly.
That was the log output:
2019-05-14 13:03:33.363 WARN [-,,,] 28633 --- [o-eventloop-4-3] c.m.a.c.r.i.ResourceThrottleRetryPolicy : Operation will be retried after 8569 milliseconds. Current attempt 1, Cumulative delay PT8.569S Exception: {}
Actual behavior
Error is thrown (see log output above) and no retry is performed.
Environment summary
First discovered in SDK Version: com.microsoft.azure:azure-cosmosdb:2.4.5.
Same for the 3.0.0-beta and also 3.3.1.
Currently we are using 3.5.0, still the same issue.
The original issue was created here:
https://github.com/Azure/azure-cosmosdb-java/issues/118
@kushagraThapar asked me to create a new issue here. I also updated the description to reflect the current issue status.
@dkroehan Thank you for re-creating this issue, I will pick it up soon.
@kushagraThapar @simplynaveen20 Do you have any news concerning this topic?
@f-paulus Apologies, but there are no updates on this one yet. Let me prioritize it for our next sprint.
@kushagraThapar The sprint seems to be closed, but the ticket is still open. Any update on this?
@dkroehan - Since V4 SDK is out and is more stable and performant than v3 SDK, we are recommending our end users to move to V4 SDK.
You can read more about V4 SDK here: https://devblogs.microsoft.com/cosmosdb/java-sdk-v4-ga/
Here are the release notes: https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-java-v4
You can read more about migration guide here: https://docs.microsoft.com/en-us/azure/cosmos-db/migrate-java-v4-sdk?tabs=java-v4-async
You can also read about performance guide: https://docs.microsoft.com/en-us/azure/cosmos-db/performance-tips-java-sdk-v4-sql?tabs=api-async and troubleshooting guide: https://docs.microsoft.com/en-us/azure/cosmos-db/troubleshoot-java-sdk-v4-sql
I understand this issue has been open for a long time now, but just wanted to let you know that since V4 is GA now, it would be great if you can update your application on latest V4 SDK, and check if the issue is still reproducible.
In that case, we can quickly fix this on V4 SDK, but fixing this on V3 SDK might take a lot more time as compared to V4 SDK.
I'm getting the same error in V4 SDK as well.
[7/10/2020 6:46:18 AM] 2020-07-10 12:16:18.061 ERROR 88496 --- [-rntbd-nio-3-11] c.a.c.implementation.ClientRetryPolicy : locationEndpoint is null because ClientRetryPolicy::onBeforeRequest(.) is not invoked, probably request creation failed due to invalid options, serialization setting, etc.
[7/10/2020 6:46:18 AM] 2020-07-10 12:16:18.265 ERROR 88496 --- [-rntbd-nio-3-11] i.RenameCollectionAwareClientRetryPolicy : onBeforeSendRequest is not invoked, encountered failure due to request being null
[7/10/2020 6:46:18 AM] com.azure.cosmos.implementation.RequestRateTooLargeException: ["Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: http://aka.ms/cosmosdb-error-429"], userAgent=Windows10/10.0 JRE/1.8.0_191-1-redhat cosmosdb-java-sdk/4.1.0
[7/10/2020 6:46:18 AM] {"requestLatency":667,"requestStartTimeUTC":"10 Jul 2020 06:46:17.393","requestEndTimeUTC":"10 Jul 2020 06:46:18.060","connectionMode":"DIRECT","responseStatisticsList":[{"storeResult":{"storePhysicalAddress":{"uri":"rntbd://cdb-ms-prod-centralus1-fd30.documents.azure.com:14002/apps/426496b4-7542-4fec-a67c-18aeaddb5ede/services/dee5533e-1a74-45fd-9751-59ba20254c72/partitions/b9fb18e4-3f2a-4509-8662-59cacd92da69/replicas/132367378181599881p/","uriasString":"rntbd://cdb-ms-prod-centralus1-fd30.documents.azure.com:14002/apps/426496b4-7542-4fec-a67c-18aeaddb5ede/services/dee5533e-1a74-45fd-9751-59ba20254c72/partitions/b9fb18e4-3f2a-4509-8662-59cacd92da69/replicas/132367378181599881p/"},"lsn":539249,"globalCommittedLsn":539249,"partitionKeyRangeId":"","isValid":true,"statusCode":429,"subStatusCode":3200,"isGone":false,"isNotFound":false,"isInvalidPartition":false,"requestCharge":0.38,"itemLSN":-1,"sessionToken":null,"exception":"[\"Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: http://aka.ms/cosmosdb-error-429\"]","transportRequestTimeline":null},"requestResponseTime":1594363578.060000000}],"supplementalResponseStatisticsList":[],"addressResolutionStatistics":{},"regionsContacted":[],"retryContext":{"retryCount":0,"statusAndSubStatusCodes":null,"retryLatency":0},"metadataDiagnosticsContext":{"metadataDiagnosticList":null},"serializationDiagnosticsContext":{"serializationDiagnosticsList":null},"gatewayStatistics":null,"systemInformation":{"usedMemory":"300749 KB","availableMemory":"3398451 KB","processCpuLoad":"0.7040163103075682 %","systemCpuLoad":"23.71839961548118 %"}}
[7/10/2020 6:46:18 AM] at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.messageReceived(RntbdRequestManager.java:813) ~[azure-cosmos-4.1.0.jar:na]
[7/10/2020 6:46:18 AM] at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.channelRead(RntbdRequestManager.java:174) ~[azure-cosmos-4.1.0.jar:na]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:255) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1429) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1199) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1243) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[azure-functions-java-worker.jar:1.5.3]
[7/10/2020 6:46:18 AM] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191-1-redhat]
I got the same error with SDK v4.0.1.
I'm wondering that com.azure.cosmos.implementation. RxDocumentClientImpl#executeStoredProcedureInternal() doesn't invoke retryPolicy.onBeforeSendRequest(request) to set locationEndpoint.
@yito88 - we are taking a look at this - thanks for the suggestion.
@xinlian12 is taking a look at this and will get a fix out soon.
fix has merged into master. The fix will be included in 4.2.1.
Update: based on the latest plan, this fix will be included in 4.3.0.
Most helpful comment
fix has merged into master. The fix will be included in 4.2.1.