Envoy: h2 connection pool is limited by SETTINGS_MAX_CONCURRENT_STREAMS

Created on 29 Mar 2018 · 44Comments · Source: envoyproxy/envoy

It turns out that gRPC async client won't open more than SETTINGS_MAX_CONCURRENT_STREAMS concurrent streams to xDS gRPC server (good!), but EDS monitors are long-lived streams, waiting forever, and it doesn't look that gRPC async client opens more than a single HTTP/2 connection to backend (bad!), which means that total number of working EDS endpoints is limited by xDS's settings, and only the first SETTINGS_MAX_CONCURRENT_STREAMS EDS will be able to establish HTTP/2 stream and receive responses.

The solution is for gRPC async client to open another HTTP/2 connection once it reaches xDS's SETTINGS_MAX_CONCURRENT_STREAMS.

Temporary workaround is to increase SETTINGS_MAX_CONCURRENT_STREAMS on xDS server, but that breaks once there are middle proxies involved.

See https://github.com/istio/istio/issues/4593 for background.

cc @htuch @mattklein123 @costinm @ldemailly @lizan

bug help wanted

Source

PiotrSikora

Most helpful comment

I can now add some more info.
I still don't understand where the issue is.
My issue does not concern EDS or XDS but Envoy proxying from my client to my server.
In this situation, Envoy creates ONE TCP connexion to the upstream server per CPU. In my tests with 2 CPUs --> 2 TCP connexions
Then Envoy will multiplex HTTP/2 gRPC calls into the two TCP cnx.
Here are the scenarios for N clients on servers with 1 CPU :

one Envoy

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> server

Everything working as expected, can go up to 1024 Streams as per Envoy's default. If I try more, I do get some 503's, which is the expected behaviour

two Envoy

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> Envoy Sidecar --> 1TCP/N Stream --> server

In fact, what is happening is :

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/100 Stream --> Envoy Sidecar --> 1TCP/100 Stream --> server

Which means if N > 100, I lose connexions.

I can't find anything why the limit is 100, but I can reproduce this situation every time.
If I have 2 CPUs, I see 2 TCP cnx and 100 streams in each...

I'm still digging...

prune998 on 12 Apr 2018

👍3

All 44 comments

Unoptimized EDS is the root cause. With ADS, we have a single EDS stream globally, which is nice for the obvious reasons. With non-ADS, currently Envoy opens a new stream for each cluster, regardless of whether they point at the same management server and could be converged. I think it would be best to go in and fix EDS to reuse streams. It's not that hard, I'd SWAG this as 2-3 days effort including tests.

htuch on 29 Mar 2018

That would only address envoy->pilot but not envoy->unmodified grpc app

we can't assume all applications will change their configuration to deal with envoy limitations ?
(unless you are saying that issue is specific to the grpc client used for *DS v2 but not for the grpc/h2 filter ?)

ldemailly on 29 Mar 2018

I believe same problem exists for HTTP Async Client as well, if the cluster is configured with H2. The gRPC Async Client is simply depending on that.

lizan on 29 Mar 2018

@htuch EDS optimizing is good and solves the exact issue for istio, but I feel fixing gRPC async client (and underlying HTTP async client) is more important. We cannot rely on gRPC stream optimization for all cases.

lizan on 29 Mar 2018

👍1

I don't really understand what EDS stream optimization means. Do you mean changing the API to put requests/responses over a single stream?

The proper fix here is actually at the HTTP/2 connection pool level. The pool should be able to create multiple connections per backend host if needed. There are other reasons we actually want this also (allowing for more connection/stream fan out in the case of middle proxies).

mattklein123 on 29 Mar 2018

👍2

@mattklein123 today we have 1 EDS stream per cluster. EDS can have multiple cluster subscriptions per stream, but this isn't done today outside of ADS.

htuch on 30 Mar 2018

There are no API changes, only Envoy implementation tuning.

htuch on 30 Mar 2018

@htuch OK I see. Can we open a different issue on that? I would like to keep this issue open to track h2 connection pool which we should also fix at some point.

mattklein123 on 30 Mar 2018

@mattklein123 Sure, https://github.com/envoyproxy/envoy/issues/2943.

htuch on 30 Mar 2018

One possible option (or hack?) would be to override the server-sent 'max streams' until multiple connections are implemented. If the app happens to use a h2 stack that doesn't enforce max streams
they may work.

Or document how to set max stream for common languages and stacks (along with docs on how to
set h2 without mtls, which is also required for upstream h2 apps behind envoy).

The biggest problem was the lack of clear information about this limitation.

costinm on 30 Mar 2018

One possible option (or hack?) would be to override the server-sent 'max streams' until multiple connections are implemented. If the app happens to use a h2 stack that doesn't enforce max streams . they may work.

That's going to break more stuff than fix.

PiotrSikora on 30 Mar 2018

👍1

Extra info:

http2 implementaton in golang1.10 sets default to 250 ( with a comment "make it 100 as GFE"), golang grpc defaults to 100.
Piotr is right - at least golang impl does check and terminates the connection if the advertised max stream is not respected (processHeaders()).

So the real fix - or lots of docs and changes in upstream servers are required, no quick hack.

costinm on 30 Mar 2018

For folks looking for management server workarounds in the interim, here is an example of a Go management server change which is pretty trivial: https://github.com/heptio/contour/pull/308

mattklein123 on 31 Mar 2018

I am not able to reproduce this issue.

I wonder if the problem with istio pilot was that the proxy is/was setup as TCP ? @costinm

the only settings I see when using envoy as a grpc (h2) proxy is:
22:17:01 http2: Framer 0xc420630000: read SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=268435456

while I go directly against my same go server with a max streams of 16:

22:21:38 http2: Framer 0xc4204e8000: read SETTINGS len=6, settings: MAX_CONCURRENT_STREAMS=16

and either way I can do > 100 simultaneous streams on 1 connection

ldemailly on 3 Apr 2018

if I remove maxstream on the backend side I really push it (doing 1000 simultaneous streams * 2 connections) I get about 15% errors:

22:33:59 W grpcrunner.go:96> Error making grpc call: rpc error: code = Internal desc = transport: received the unexpected content-type "text/plain"
...
Ended after 17.840324262s : 4000 calls. qps=224.21
Sleep times : count 2000 avg 4.5242081 +/- 1.416 min 1.570214834 max 6.106976232 sum 9048.41625
Aggregated Function Time : count 4000 avg 4.6826156 +/- 2.232 min 1.732992693 max 9.83520712 sum 18730.4624
# range, mid point, percentile, count
>= 1.73299 <= 2 , 1.8665 , 10.28, 411
> 2 <= 3 , 2.5 , 28.68, 736
> 3 <= 4 , 3.5 , 48.15, 779
> 4 <= 5 , 4.5 , 61.98, 553
> 5 <= 7.5 , 6.25 , 83.17, 848
> 7.5 <= 9.83521 , 8.6676 , 100.00, 673
# target 50% 4.13382
# target 75% 6.53597
# target 90% 8.44727
# target 99% 9.69641
# target 99.9% 9.82133
Ping SERVING : 3398
Ping -1 : 602

I get in the envoy ingress logs (no errors on the envoy side car) :

[2018-04-03T05:29:44.084Z] "POST /fgrpc.PingServer/Ping HTTP/2" 503 UO 23 57 6096 - "10.138.0.12" "grpc-go/1.11.1" "cd0cf852-3577-9a57-a19c-d6435302ff2b" "fortio-stage.istio.io:80" "-"

What is UO ?

ldemailly on 3 Apr 2018

@ldemailly I'm pretty sure that gRPC xDS client talks directly to selected cluster over HTTP/2, so listener settings (HTTP or TCP proxy) for Pilot wouldn't matter.

As for the difference in SETTINGS received when proxying via Envoy vs connecting directly, those settings are per hop, so that's expected. When you say that you can do >100 simultaneous streams over single connection, you mean single connection to Envoy, correct? Envoy still establishes multiple connections to the backend server (respecting 16 streams per connection)?

From documentation:
UO: Upstream overflow (circuit breaking) in addition to 503 response code.

PiotrSikora on 3 Apr 2018

I'm dealing with this kind of issues using Envoy (Istio) since last week where I'm trying to DDOS my application. https://github.com/istio/istio/issues/4658

Go client (GO 1.10 / gRPC) -TLS- > Envoy (Istio 0.7.0) --> server (Go 1.10 / gRPC)

Trying to DDOS my app, I start some Go client, which in fact each simulate 100 real clients, so 100 TCP cnx to Envoy each.
My client then send a unary request to one endpoint, then open a stream and start sending the payload.
Usually, one client = 1TCP + 1 GET + 1 STREAM.

I see Envoy opening 4 TCP cnx to my server application, proxying every client connections into it.

Which means that, if my gRPC server is setup to allow 250 streams per client, I will be able to handle 1000 clients only (250 streams * 4 TCP connections).

While I'm not sure for the numbers, I can confirm the behaviour.
Setting a higher limit in the server using grpc.MaxConcurrentStreams(math.MaxInt32), mitigated the issue.

My conclusion is that :

issue is not linked to config endpoints (RDS...).
according to the GO implementation https://github.com/golang/net/commit/1c05540f6879653db88113bc4a2b70aec4bd491f

Currently if the http2.Transport hits SettingsMaxConcurrentStreams for a
server, it just makes a new TCP connection and creates the stream on the
new connection. This CL updates that behavior to instead block RoundTrip
until a new stream is available.

Which means that if Envoy only opens 1 TCP cnx to the server, and if the MaxConcurrentStreams is 250, Envoy will not be able to handle more than 250 clients

Looking at https://tools.ietf.org/html/rfc7540#section-5.1.2, I understand that the limit from the server is used, so Envoy should take care of limiting the number of streams per cnx, so create more TCP cnx when more streams are needed.

I hope I'm wrong, but this is the behaviour I see.
While 100 or 250 is clearly too low, 100.000 sounds around a good value for both gRPC and TCP to use max ressources without exhaustion.

I can't find anything about the real behaviour Envoy should have regarding this
I'm not a dev and have really hard time understanding Envoy C++ code.
Could anyone point us to the right piece of code and maybe explain what is going on in this situation ?

Also, I f my server is set to MaxConcurrentStreams, what will be transmitted to the client ? Is it the same value or the one (which one ?) from the circuit breaker config ?

prune998 on 3 Apr 2018

👍2

I also noted something special with the pool of TCP cnx used by Envoy when using gRPC Streams.

setup

Envoy installed as K8s Ingress using Istio
Istio DestinationPolicy defined for the the service as :

spec:
  circuitBreaker:
    simpleCb:
      httpConsecutiveErrors: 10
      httpDetectionInterval: 1s
      httpMaxEjectionPercent: 100
      httpMaxPendingRequests: 1000
      httpMaxRequests: 100000
      httpMaxRequestsPerConnection: 500
      maxConnections: 100000
      sleepWindow: 30s
  destination:
    name: useredged

I can confirm envoy is using my setup doing a curl http://localhost:15000/clusters :

out.useredged.dev.svc.cluster.local|grpc::default_priority::max_connections::100000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_pending_requests::1000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_requests::100000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_retries::3
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_connections::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_pending_requests::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_requests::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_retries::3

Note that the httpMaxRequestsPerConnection parameter is not displayed here...

load test

I'm using our gRPC client to load test. Each client opens 500 TCP connextions to Envoy then start a bidirectional stream and send one message per second.
I'm starting one client at a time, letting all the cx and rq to setup.

In the above setup, everything is fine until a reach the 4000 active requests :

2000 is OK
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_active: 2000
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_cx_active: 6
3000 is OK
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_active: 3000
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_cx_active: 8
4000 start failing some requests
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_503: 2500
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_5xx: 2500
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_rq_active: 1500
cluster.out.useredged.dev.svc.cluster.local|grpc.upstream_cx_active: 8

Discussion

this can be reproduced anytime.
If I increase the httpMaxRequestsPerConnection in the DestinationPolicy to 1000, I will be able to reach 8000 rq before getting 503s.

At the same time I can see hundreds of active cx on a HTTP 1.1 endpoint.
Why can't Envoy scale the number of TCP cnx to the backend server ?

prune998 on 4 Apr 2018

one Envoy

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> server

Everything working as expected, can go up to 1024 Streams as per Envoy's default. If I try more, I do get some 503's, which is the expected behaviour

two Envoy

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> Envoy Sidecar --> 1TCP/N Stream --> server

In fact, what is happening is :

N client --> 1TCP/1Stream each --> Envoy --> 1TCP/100 Stream --> Envoy Sidecar --> 1TCP/100 Stream --> server

Which means if N > 100, I lose connexions.

I can't find anything why the limit is 100, but I can reproduce this situation every time.
If I have 2 CPUs, I see 2 TCP cnx and 100 streams in each...

I'm still digging...

prune998 on 12 Apr 2018

👍3

I really think this issue is not linked to https://github.com/istio/istio/issues/4940
see https://github.com/istio/istio/issues/4940#issuecomment-381248731

Maybe I should open a new issue here as it really sound to live inside Envoy ?

prune998 on 13 Apr 2018

@prune998 your issue is a different issue. I know what the issue is. Please delete all your comments from this issue as they are unrelated and open them in a new issue. Thanks.

mattklein123 on 13 Apr 2018

@mattklein123 I created a new issue https://github.com/envoyproxy/envoy/issues/3076

I would love to here about you if you know what the issue is as it's blocking our deployments.
I don't think I'll delete my comments but maybe state that effectively they are not related to the original issue.

Thanks.

prune998 on 13 Apr 2018

👍1

@PiotrSikora can we consider this issue complete? There is a lot in here but I think your commit fixes the original issue here?

mattklein123 on 21 Sep 2018

AFAIK, this issue (i.e. number of outgoing streams is limited by SETTINGS_MAX_CONCURRENT_STREAMS) isn't fixed yet. I'll consider this fixed once Envoy starts opening new connections to the peer once it hits the SETTINGS_MAX_CONCURRENT_STREAMS limit.

3658 fixed #3076 which made this even worse by incorrectly setting the limit to 100 when `SETTINGS_MAX_CONCURRENT_STREAMS` wasn't advertised by the peer, but it doesn't directly address this issue.

PiotrSikora on 21 Sep 2018

OK moving to 1.9.0

mattklein123 on 21 Sep 2018

Doesn't seem like anyone is actively working on this. Mind if I look into implementing this?

mjpitz on 28 Jan 2019

👍1

@mjpitz I have invited you to envoyproxy org, if you accept I can assign you the issue.

htuch on 29 Jan 2019

@htuch : Joined

mjpitz on 30 Jan 2019

Alright. I spent a fair bit of time diving into the code base.

General Notes:

conn_pool.cc maintains the pooling semantic
manages 2 channels (primary_client_ and draining_client_)
all requests are sent over the primary_client_ until max_requests_per_connection is reached
if this is set to 0, then maxTotalStreams() is used
maxTotalStreams is set to 1 << 29 by default
when max_requests_per_connection is reached, the primary and draining clients are flipped
there's a comment in the movePrimaryClientToDraining method about it not worth keeping a list
max_requests_per_connection differs from SETTINGS_MAX_CONCURRENT_STREAMS
- max_requests_per_connection comes from the cds service

When primary is swapped to draining:

draining_client_ is closed
primary_client_ is moved to draining
primary_client_ is nulled out

Implementation proposal:

All changes isolated to source/common/http/http2/conn_pool.cc
Move away from primary/draining semantic to an elastic list
ActiveClientPtrs are pruned from the list when:
- there are 0 active request on the client
- and there are newer connections in the pool
By leveraging a linked list, we will:
- have constant time access to the "primary_client_" which will now be the last element in the list
- easily prune "draining_clients_" which will now be older elements in the list

Thoughts on this approach? Anything major I'm missing or things I should consider?

mjpitz on 30 Jan 2019

👍1

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] on 1 Mar 2019

Sorry I haven't gotten to this yet. I got sidetracked with a presentation. Hopefully I should be able to get to it in the next couple of weeks.

mjpitz on 6 Mar 2019

stale[bot] on 5 Apr 2019

keepalive

ldemailly on 9 Apr 2019

I've been talking myself in circles so it would be nice to talk some of this through some more before implementing.

Since being assigned this ticket, I've been reading up and understanding more about how the Envoy threading model works. This was well captured by the following blog post and tech talk:

A quick TLDR: envoy maintains a pool of workers (configured by --concurrency, default is number of cores on the machine). Each worker maintains its own connection (e.g a thread local connection). To me, this means that the maximum number of concurrent streams would actually be concurrency * MAX_CONCURRENT_STREAMS (assuming even distribution of requests across connections). It would seem that if we need to increase the throughput of the system to get around MAX_CONCURRENT_STREAMS, we could simply modify the --concurrency parameter.

One concern that came to mind was how this impacts the purpose / intent of MAX_CONCURRENT_STREAMS. From what I can gather as far as resources go, it's purpose isn't well documented. As an engineer, I think about this setting as a way for service owners to throttle workloads being performed by their clients. By implementing an "auto-scaling" feature like this, we effectively bypass this setting. I've read a few articles around this and many people are being suggested to work with the service owner to better understand the workload they are trying to perform. Here's one issue under the http2-spec detailing a similar response.

https://github.com/http2/http2-spec/issues/38

Open Questions:

Is my understanding around concurrency * MAX_CONCURRENT_STREAMS true?
Does anyone have any insight into the intent / purpose / application of MAX_CONCURRENT_STREAMS?

Ultimately, I've found myself in a personal philosophical debate of "Sure, I could add this feature, but should I?" Getting some clarity around my questions above will help resolve my internal debate.

mjpitz on 9 Apr 2019

👍1

Does anyone have any insight into the intent / purpose / application of MAX_CONCURRENT_STREAMS?

Afaik, it is the maximum multiplexing of logical grpc streams over a single tcp socket/connection; it's not meant to mean no additional streams should exist between the client and server but just a measure of limitation of throughput and congestion over a single socket (imo)

ldemailly on 9 Apr 2019

👍1

I know gRPC defaults to one connection per backend returned via name resolver (at least in Java). The google-apis/gax project has a ChannelPool that allows you to increase the number of connections per backend. (just to add another reference to where we're already bypassing this constraint)

mjpitz on 9 Apr 2019

@mjpitz it's probably a good idea to distinguish between data and control plane here. For data plane and backends, we will effectively have an upper bound of concurrency * MAX_CONCURRENT_STREAMS. For the control plane, since this only runs on the main thread, we only have MAX_CONCURRENT_STREAMS.

I agree with the idea of making the connection pool support multiple connections. @oschaaf has also been thinking about this in the context of https://github.com/envoyproxy/envoy-perf/tree/master/nighthawk

htuch on 10 Apr 2019

👍1

I agree with the idea of making the connection pool support multiple connections. @oschaaf has also been thinking about this in the context of https://github.com/envoyproxy/envoy-perf/tree/master/nighthawk

Yes, we should definitely do this. There are additional reasons including mitigating head of line blocking in certain cases. I think we might have an issue open specifically on this but I can't remember and I quick search doesn't return anything.

mattklein123 on 10 Apr 2019

👍1

Got an initial patch based on my proposal from back in January. Working on running tests. Here's the initial diff.

https://github.com/mjpitz/envoy/pull/1/files

mjpitz on 10 Apr 2019

@mattklein123 : if we're also looking to mitigate hol-blocking, we probably want to provision the connections ahead of time yeah?

edit: actually.. preemtively establishing connections doesn't seem like it would help much... looks like there are a couple other good solutions though

mjpitz on 10 Apr 2019

@mjpitz yeah, pre-creation is another optimization, though IMO I would recommend tracking that under a new/different issue. I've discussed wanting to do this many times with @alyssawilk (who also has thoughts on allowing multiple h2 connections).

mattklein123 on 10 Apr 2019

Prefetch is also something that @oschaaf and I have discussed in the context of Nighthawk.

htuch on 10 Apr 2019

So I'm not sure how I thought I was originally going to get the settings info from the h2 conn_pool. Getting deeper into the code, it seems like that's pretty well encapsulated by the client_codec (probably as it should be).

I'm curious what other ideas are floating around so I'll check in on this again tomorrow.

mjpitz on 10 Apr 2019

Fixed with recent connection pool changes.

mattklein123 on 30 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Access log's timestamps are still in UTC regardless a containter's timezone?

hawran · 3Comments

[rbac] Envoy doesn't break an existing connection when rbac rules no longer allow traffic

yanniszark · 3Comments

Admin endpoints not responding

justConfused · 3Comments

Make xDS request timeout configurable

rshriram · 3Comments

http2 to external upstream fails

dstrelau · 3Comments

Envoy: h2 connection pool is limited by SETTINGS_MAX_CONCURRENT_STREAMS

Most helpful comment

one Envoy

two Envoy

All 44 comments

setup

load test

Discussion

one Envoy

two Envoy

3658 fixed #3076 which made this even worse by incorrectly setting the limit to 100 when SETTINGS_MAX_CONCURRENT_STREAMS wasn't advertised by the peer, but it doesn't directly address this issue.

Related issues

3658 fixed #3076 which made this even worse by incorrectly setting the limit to 100 when `SETTINGS_MAX_CONCURRENT_STREAMS` wasn't advertised by the peer, but it doesn't directly address this issue.