Aspnetcore: configurable HTTP/2 PING timeouts on kestrel

Created on 17 Oct 2019  路  21Comments  路  Source: dotnet/aspnetcore

I'm trying to evaluate gRPC for my client/server context.

I looked briefly into ASP.Net core 3 and its new exciting gRPC support. As I need to use the gRPC streaming feature in a long lived connected scenario, I needed the HTTP/2 PING feature to detect the zombie connections (basically to know: "Am I still having an up and running connection to the client?")

So I opened a question on the gRPC project , but it seems that this feature can not be implemented as there is no kestrel API to control the HTTP/2 PING.

It seems taht other languages/stacks implement an API like that in other HTTP/2 languages/stacks (like node's HTTP/2)

So I would need a support of HTTP/2 PING feature in kestrel (and HttpClient) to be able to have a implementation of a heartbeat feature on gRPC-dotnet.

Done HTTP2 area-servers enhancement servers-kestrel

All 21 comments

In kestrel adding an API or timer for this would be pretty straight forward.

@shirhatti we should also consider http.sys. A timer would be easier there.

After a few discussions in the corefx repository for similar change on HttpClient, it is now clear for me that it would be better to have a parametrable timeout system to manage HTTP/2 PING's behavior (rather than direct API access to send PINGs).

My needs rely on a gRPC usage case, where http/2 ping frames are explicitly used. To do that, I would need access to the following equivalent ping parameters taken from the grpc-core to be available on grpc-dotnet (and therefore also in kestrel):

  • GRPC_ARG_KEEPALIVE_TIME_MS: This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on the transport.
  • GRPC_ARG_KEEPALIVE_TIMEOUT_MS: This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close the connection.

I think that without such a feature, real gRPC streaming scenarios just won't be possible with grpc-dotnet.

What about all of the advanced options listed in that keepalive doc?

@JamesNK

I'm not familiar with the details of keepalive. I can't really comment on which are must haves.

Netty is the primary Java gRPC server. You could look to it as an example of a more general purpose server and see what configuration it has for keepalive.

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

Backlogging for now. We don't have enough input to consider this high priority for 5.0 at this time.

:-( gRPC streaming will be tricky to use without that.
What kind of information do you need?

:-( gRPC streaming will be tricky to use without that.

Streaming will work normally unless you have connections that are idle for a long time, correct?

Yes, gRPC streaming calls work without HTTP2 PINGs. But they're often used for push-style notification, so it's not uncommon, in my experience, for them to go idle for a long time.

For example, the client starts a streaming call and then the server can push notifications when they arrive. When there are messages flowing, there's no need for the PINGs. The application's messages are conveying useful information and serving as connectivity checks. But when the stream idle, HTTP2 PINGs can be configured to kick in to know whether the connection is still alive and to keep it alive in the face of load balancers that close idle connections (E.g., AWS's Elastic Load Balancer defaults to 350 seconds, Azure's Load Balancer is 4 minutes, Google Cloud's Backend Load Balancer is 30 seconds, I think. I've not used that one.). If the client detects that the connection has died while idle, it can also proactively reconnect to minimize delays when it does later need to send a message.

HTTP2 PINGs mean that the application protocol doesn't need an explicit "Still here" message just for detecting connection liveness. Also, one HTTP2 PING can be used for the entire connection, regardless of how many streaming calls are multiplexed atop it.

The gRPC C library allows PINGs to be configured client to server and server to client, depending on who needs to know.

Let me know if more info would be useful.

I think we understand the value and the feature, it's a matter of prioritization at this point :).

Is the idea just to keep the connection alive? If so, sending PING frames at some interval is likely to be sufficient. We do that in WebSockets and it seems reasonable.

It gets more complicated if we want to do things like checking for the PING+ACK responses and ensuring they come back in a timely fashion.

@chwarr and @chrisdot Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

We don't have enough input to consider this high priority for 5.0 at this time.
I think we understand the value and the feature, it's a matter of prioritization at this point :).

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

Would it be sufficient to just send PING frames on an interval and silently process the response PING+ACK frame?

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

For someone implementing a gRPC server atop Kestrel completely silent processing is likely not sufficient. The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection. Analogously for the client-side, but that's tracked by different issue already.) As mentioned before, @JamesNK is building a gRPC server stack atop Kestrel in the grpc-dotnet project project.

The gRPC HTTP2 protocol has application visible semantics for failed/late PINGs:

If a server initiated PING does not receive a response within the deadline expected by the runtime all outstanding calls on the server will be closed with a CANCELLED status. An expired client initiated PING will cause all calls to be closed with an UNAVAILABLE status.

I wasn't sure if you needed more information, more use cases, or more people wanting it. So, I provided more information. :-)

A little of both ;). Just trying to get an idea of the best bang of our buck cost/benefit.

Silent processing would be a good start and would probably address 60% of what PINGs are used for in practice by gRPC consumers.

That's good. I think we could do something like this fairly quickly, as it's quite straightforward and we already have a heartbeat timer we can use for this.

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

I'll move this up to 5.0 to see how far we can get. This seems like a good win for gRPC streaming.

Actually, clearing the milestone to put this in front of triage again for a bit of discussion before scheduling.

The server side will need some way of knowing the the PINGs have failed or been sufficiently delayed so that it can fail all the inflight streaming calls atop that HTTP2 connection

And TCP ACKs are insufficient for this? Tracking outstanding PINGs is quite a bit more costly, but certainly not out of the picture. As you say though, the spec does expect it to be done. We've definitely had our fair share of bad run-ins with TCP ACKing.

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs. For instance you couldn't build a proxy with Kestrel that forwards PINGs, it doesn't expose them in either direction.

TCP ACKs are only single hop where PINGs should be end-to-end. In practice a higher layer proxy still might not forward PINGs

Very true, good point.

CC @jtattermusch

We'll look at generating new PINGs at a configurable interval for 5.0. Future work (considering clients disconnected if they don't PING+ACK, etc.) is TBD.

@JamesNK @jtattermusch How do you folks feel about this? Do we need this in Kestrel?

Maybe. I looked at pings a few weeks ago and had some questions for Jan that have been answered. I'll look at it again.

As I have written in https://github.com/dotnet/runtime/issues/31198, we need quick fix please, otherwise we are forced to use c server gRPC implementation.
Thank You

Now that we have a nice proposal for HttpClient, wouldn't it be relevant to have the equivalent on the server side? @JamesNK ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danroth27 picture danroth27  路  130Comments

Rast1234 picture Rast1234  路  104Comments

KerolosMalak picture KerolosMalak  路  269Comments

oliverjanik picture oliverjanik  路  91Comments

moodya picture moodya  路  153Comments