Found by a customer: https://twitter.com/stevejgordon/status/1169995743568048129
This is a high impact bug. A user will cancel a streaming gRPC call on the client, but the server will never be notified. If the server is checking CancellationToken.IsCancellationRequested then it will hang open forever.
This is a regression of https://github.com/dotnet/corefx/issues/38391
Related: https://github.com/dotnet/corefx/issues/39049
Repro:
git clone https://github.com/JamesNK/grpc-dotnet.gitgit checkout jamesnk/duplex-cancellation-regressiondotnet test test\FunctionalTests --filter Name~ServerStreaming_CancellationOnClient_SentToServerLogic in the test is something like:
Wireshark log: cancellation-regression.zip
@wfurt @geoffkizer @stephentoub
@JamesNK, what version? Is there a repro? How does it differ from the test @wfurt says he added? Etc.
The repo in the original issue should re-create it. However it won't compile against the latest .NET Core SDK because of breaking changes.
I'm writing a new regression test now.
.NET Core SDK (reflecting any global.json):
Version: 3.0.100-rc1-014031
Commit: a4f411d3b7
To echo what @stephentoub said: how is this different from existing tests we have, either for GRPC or HttpClient? We know that cancellation does send RST_STREAM for the tests we have, so what's special about this?
Repro and description of the test added.
I was paranoid that the gRPC client wasn't correctly canceling SendAsync so I added some temp logging:
You can see "SendAsync canceled!" show up in the test output:
1.192s Grpc.Net.Client.Internal.GrpcCall - Debug: gRPC call canceled.
1.193s Grpc.Net.Client.Internal.GrpcCall - Information: SendAsync canceled!
1.195s Grpc.Net.Client.Internal.GrpcCall - Error: Call failed with gRPC error status. Status code: 'Cancelled', Message: 'Call canceled by the client.'.
In duplex scenarios, once SendAsync completes, the token you passed to it won't cancel the request anymore. See discussion in dotnet/runtime#1591.
What will cause the request to be cancelled is:
(1) If you throw while sending the request body. It sounds like in this scenario, the request body is already completed. Correct?
(2) If you dispose the response body before we've received it all, then we'll cancel the request. However, if we've already received the entire response body, then we won't (and can't) cancel the request, because it's already been fully received. Does that apply here?
Edit: To clarify in (2) above, we won't and can't cancel the request if the request body is completed and the response body is fully received.
In duplex scenarios, once SendAsync completes, the token you passed to it won't cancel the request anymore. See discussion in dotnet/runtime#1591.
I got the impression from reading the issue that the opposite was true.
Edit: However this test isn't a duplex call. So the situation might be different here.
Edit: To clarify in (2) above, we won't and can't cancel the request if the request body is completed and the response body is fully received.
What does "fully received" even mean here? When would it be fully received? The server is simply streaming messages constantly. Although at some point the server will stop.
I got the impression from reading the issue that the opposite was true.
See: https://github.com/dotnet/corefx/issues/39049#issuecomment-509871804
What does "fully received" even mean here? When would it be fully received? The server is simply streaming messages constantly. Although at some point the server will stop.
Meaning we've received the end of the response stream (EndStream flag)
Hmmm, ok. I was sure this worked at this point: https://github.com/dotnet/corefx/issues/38391
Theory: Maybe what was fixed then was that disposing wasn't sending a cancellation. And because the client was disposed then it looked like cancellation worked.
it should be clear from packet capture that we either should reach end of stream or send RST, right @geoffkizer ? If not that means operation did not finish and that would be bug.
it should be clear from packet capture that we either should reach end of stream or send RST, right @geoffkizer
Yep. Trying to look now, but wireshark is being weird.
I think the current behavior is unexpected, but if there is no time to rethink/improve it in 3.0 then this could be fixed in the gRPC client. We can dispose the HttpResponseMessage on cancellation.
Unexpected in that you expect the cancellation token from SendAsync to cancel the request, even after SendAsync completes?
I agree it's not obvious. We went around on this in dotnet/runtime#1591, and this is where we landed. For better or worse, I think it's too late to change this.
The wireshark trace shows:
(1) Client sends a POST and full request body
(2) Server sends response headers and then sends partial response body chunks every ~1 second
So it looks like what's happening here is that because GRPC never disposes the response body, we are never sending a RST_STREAM. Changing GRPC to dispose the response body should fix this.
Confirmed disposing HttpResponseMessage works and no other gRPC unit tests regressed.
I should add that the discussion in dotnet/runtime#1591 doesn't really apply here because it's not duplex.
It's always been the case (edit: for non-duplex) that when you use ResponseHeadersRead and SendAsync completes, the cancellation token passed to SendAsync will no longer cancel receiving the response. This is how it works for HTTP/1.1 too.
Looks like this is by design, closing.
Most helpful comment
Confirmed disposing HttpResponseMessage works and no other gRPC unit tests regressed.