Aspnetcore: configurable HTTP/2 PING timeouts on kestrel

Created on 17 Oct 2019 · 21Comments · Source: dotnet/aspnetcore

I'm trying to evaluate gRPC for my client/server context.

I looked briefly into ASP.Net core 3 and its new exciting gRPC support. As I need to use the gRPC streaming feature in a long lived connected scenario, I needed the HTTP/2 PING feature to detect the zombie connections (basically to know: "Am I still having an up and running connection to the client?")

So I opened a question on the gRPC project , but it seems that this feature can not be implemented as there is no kestrel API to control the HTTP/2 PING.

It seems taht other languages/stacks implement an API like that in other HTTP/2 languages/stacks (like node's HTTP/2)

So I would need a support of HTTP/2 PING feature in kestrel (and HttpClient) to be able to have a implementation of a heartbeat feature on gRPC-dotnet.

Done HTTP2 area-servers enhancement servers-kestrel

Source

chrisdot

👍3 👀1

All 21 comments

In kestrel adding an API or timer for this would be pretty straight forward.

@shirhatti we should also consider http.sys. A timer would be easier there.

Tratcher on 17 Oct 2019

After a few discussions in the corefx repository for similar change on HttpClient, it is now clear for me that it would be better to have a parametrable timeout system to manage HTTP/2 PING's behavior (rather than direct API access to send PINGs).

My needs rely on a gRPC usage case, where http/2 ping frames are explicitly used. To do that, I would need access to the following equivalent ping parameters taken from the grpc-core to be available on grpc-dotnet (and therefore also in kestrel):

GRPC_ARG_KEEPALIVE_TIME_MS: This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on the transport.

GRPC_ARG_KEEPALIVE_TIMEOUT_MS: This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close the connection.

I think that without such a feature, real gRPC streaming scenarios just won't be possible with grpc-dotnet.

chrisdot on 18 Oct 2019

What about all of the advanced options listed in that keepalive doc?

@JamesNK

Tratcher on 18 Oct 2019

I'm not familiar with the details of keepalive. I can't really comment on which are must haves.

JamesNK on 18 Oct 2019

Netty is the primary Java gRPC server. You could look to it as an example of a more general purpose server and see what configuration it has for keepalive.

JamesNK on 18 Oct 2019

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

anurse on 23 Oct 2019

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

:-( gRPC streaming will be tricky to use without that.
What kind of information do you need?

chrisdot on 23 Oct 2019

:-( gRPC streaming will be tricky to use without that.

Streaming will work normally unless you have connections that are idle for a long time, correct?

Tratcher on 23 Oct 2019

Yes, gRPC streaming calls work without HTTP2 PINGs. But they're often used for push-style notification, so it's not uncommon, in my experience, for them to go idle for a long time.

For example, the client starts a streaming call and then the server can push notifications when they arrive. When there are messages flowing, there's no need for the PINGs. The application's messages are conveying useful information and serving as connectivity checks. But when the stream idle, HTTP2 PINGs can be configured to kick in to know whether the connection is still alive and to keep it alive in the face of load balancers that close idle connections (E.g., AWS's Elastic Load Balancer defaults to 350 seconds, Azure's Load Balancer is 4 minutes, Google Cloud's Backend Load Balancer is 30 seconds, I think. I've not used that one.). If the client detects that the connection has died while idle, it can also proactively reconnect to minimize delays when it does later need to send a message.

HTTP2 PINGs mean that the application protocol doesn't need an explicit "Still here" message just for detecting connection liveness. Also, one HTTP2 PING can be used for the entire connection, regardless of how many streaming calls are multiplexed atop it.

The gRPC C library allows PINGs to be configured client to server and server to client, depending on who needs to know.

Let me know if more info would be useful.

chwarr on 23 Oct 2019

I think we understand the value and the feature, it's a matter of prioritization at this point :).

Is the idea just to keep the connection alive? If so, sending PING frames at some interval is likely to be sufficient. We do that in WebSockets and it seems reasonable.

It gets more complicated if we want to do things like checking for the PING+ACK responses and ensuring they come back in a timely fashion.

@chwarr and @chrisdot Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

anurse on 23 Oct 2019

We don't have enough input to consider this high priority for 5.0 at this time.
I think we understand the value and the feature, it's a matter of prioritization at this point :).

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

For someone implementing a gRPC server atop Kestrel completely silent processing is likely not sufficient. The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection. Analogously for the client-side, but that's tracked by different issue already.) As mentioned before, @JamesNK is building a gRPC server stack atop Kestrel in the grpc-dotnet project project.

The gRPC HTTP2 protocol has application visible semantics for failed/late PINGs:

If a server initiated PING does not receive a response within the deadline expected by the runtime all outstanding calls on the server will be closed with a CANCELLED status. An expired client initiated PING will cause all calls to be closed with an UNAVAILABLE status.

chwarr on 24 Oct 2019

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

A little of both ;). Just trying to get an idea of the best bang of our buck cost/benefit.

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

That's good. I think we could do something like this fairly quickly, as it's quite straightforward and we already have a heartbeat timer we can use for this.

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

I'll move this up to 5.0 to see how far we can get. This seems like a good win for gRPC streaming.

anurse on 24 Oct 2019

👍1

Actually, clearing the milestone to put this in front of triage again for a bit of discussion before scheduling.

anurse on 24 Oct 2019

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs. For instance you couldn't build a proxy with Kestrel that forwards PINGs, it doesn't expose them in either direction.

Tratcher on 24 Oct 2019

👍1

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs

Very true, good point.

anurse on 24 Oct 2019

CC @jtattermusch

jtattermusch on 30 Oct 2019

We'll look at generating new PINGs at a configurable interval for 5.0. Future work (considering clients disconnected if they don't PING+ACK, etc.) is TBD.

anurse on 5 Nov 2019

👍1

@JamesNK @jtattermusch How do you folks feel about this? Do we need this in Kestrel?

shirhatti on 27 Mar 2020

Maybe. I looked at pings a few weeks ago and had some questions for Jan that have been answered. I'll look at it again.

JamesNK on 27 Mar 2020

As I have written in https://github.com/dotnet/runtime/issues/31198, we need quick fix please, otherwise we are forced to use c server gRPC implementation.
Thank You