I'm seeing a lot of HTTP 400 errors from CosmosDB that I don't quite understand. To my understanding the 400 error code is used for bad requests like invalid formats and such, however to me the pattern here looks more like throttling with increasing back-off.

Furthermore, when I look in the CosmosDB metrics tab, I see no 400's, but plenty of 429's.

But as it turns out, some 429's show up in application insights, when I look in the Dependencies tab:

Note that all the data is from the same 24hour period, and the request trace was from a request that happened in that period.
I am using the DocumentDB client with the Graphs package (queries are in gremlin).
I know that the throttling is far from ideal (currently re-writing queries to be less expensive), but I know how to deal with that and what 429 means.
How should I interpret the 400's? Is it just throttling? Is it an issue with the dependency logging in the DocumentDB client?
When executing cross partition query, the service returns the query plan to the SDK with a 400 error code, and then SDK using the query plan will locally execute the cross partition query. However this specific 400 response will never surface to the user code.
If you are using a proxy or wire dump and see 400 errors which don't reach the end user code, please be aware that they are safe and part of the our protocol. Please just ignore them as they are not really errors.
Thank you, with that information I can safely write a telemetry processor to filter out those specific 400's from application insights.
@moderakh
When executing cross partition query, the service returns the query plan to the SDK with a 400 error code, and then SDK using the query plan will locally execute the cross partition query. However this specific 400 response will never surface to the user code.
I don't understand how this can be okay or expected behavior.
I don't get it and I want to. Please explain...
Do I understand this issue correctly?
Hiding all 400 responses could also hide bugs in our programming code, because it is used by cosmos db for invalid queries.
400 Bad Request: The specified request was specified with an incorrect SQL syntax
I totally agree with @mayoatte and also please someone to explain this issue.
Seeing the same behaviour, and we need an explanation.
You can see the source code in the new v3 SDK. There is an issue that is tracking a fix for the v3 SDK.
Here is a high level overview of the query process:
EDIT: just wanted to add that this can be reprod ~in Direct mode~ MaxDegreeParallelization != 0 only.
Imo this should be reopened. We have monitoring based on dependencies collection to find out failed dependency calls. How to distinguish good bad requests from bad bad requests?
@nosalan this should be fixed in the latest v2 SDK. This is also not an issue if you are running an x64 application on Windows.
Most helpful comment
@moderakh
I don't understand how this can be okay or expected behavior.
I don't get it and I want to. Please explain...