It turns out that gRPC async client won't open more than SETTINGS_MAX_CONCURRENT_STREAMS concurrent streams to xDS gRPC server (good!), but EDS monitors are long-lived streams, waiting forever, and it doesn't look that gRPC async client opens more than a single HTTP/2 connection to backend (bad!), which means that total number of working EDS endpoints is limited by xDS's settings, and only the first SETTINGS_MAX_CONCURRENT_STREAMS EDS will be able to establish HTTP/2 stream and receive responses.
The solution is for gRPC async client to open another HTTP/2 connection once it reaches xDS's SETTINGS_MAX_CONCURRENT_STREAMS.
Temporary workaround is to increase SETTINGS_MAX_CONCURRENT_STREAMS on xDS server, but that breaks once there are middle proxies involved.
See https://github.com/istio/istio/issues/4593 for background.
cc @htuch @mattklein123 @costinm @ldemailly @lizan
Unoptimized EDS is the root cause. With ADS, we have a single EDS stream globally, which is nice for the obvious reasons. With non-ADS, currently Envoy opens a new stream for each cluster, regardless of whether they point at the same management server and could be converged. I think it would be best to go in and fix EDS to reuse streams. It's not that hard, I'd SWAG this as 2-3 days effort including tests.
That would only address envoy->pilot but not envoy->unmodified grpc app
we can't assume all applications will change their configuration to deal with envoy limitations ?
(unless you are saying that issue is specific to the grpc client used for *DS v2 but not for the grpc/h2 filter ?)
I believe same problem exists for HTTP Async Client as well, if the cluster is configured with H2. The gRPC Async Client is simply depending on that.
@htuch EDS optimizing is good and solves the exact issue for istio, but I feel fixing gRPC async client (and underlying HTTP async client) is more important. We cannot rely on gRPC stream optimization for all cases.
I don't really understand what EDS stream optimization means. Do you mean changing the API to put requests/responses over a single stream?
The proper fix here is actually at the HTTP/2 connection pool level. The pool should be able to create multiple connections per backend host if needed. There are other reasons we actually want this also (allowing for more connection/stream fan out in the case of middle proxies).
@mattklein123 today we have 1 EDS stream per cluster. EDS can have multiple cluster subscriptions per stream, but this isn't done today outside of ADS.
There are no API changes, only Envoy implementation tuning.
@htuch OK I see. Can we open a different issue on that? I would like to keep this issue open to track h2 connection pool which we should also fix at some point.
@mattklein123 Sure, https://github.com/envoyproxy/envoy/issues/2943.
One possible option (or hack?) would be to override the server-sent 'max streams' until multiple connections are implemented. If the app happens to use a h2 stack that doesn't enforce max streams
they may work.
Or document how to set max stream for common languages and stacks (along with docs on how to
set h2 without mtls, which is also required for upstream h2 apps behind envoy).
The biggest problem was the lack of clear information about this limitation.
One possible option (or hack?) would be to override the server-sent 'max streams' until multiple connections are implemented. If the app happens to use a h2 stack that doesn't enforce max streams . they may work.
That's going to break more stuff than fix.
Extra info:
So the real fix - or lots of docs and changes in upstream servers are required, no quick hack.
For folks looking for management server workarounds in the interim, here is an example of a Go management server change which is pretty trivial: https://github.com/heptio/contour/pull/308
I am not able to reproduce this issue.
I wonder if the problem with istio pilot was that the proxy is/was setup as TCP ? @costinm
the only settings I see when using envoy as a grpc (h2) proxy is:
22:17:01 http2: Framer 0xc420630000: read SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=268435456
while I go directly against my same go server with a max streams of 16:
22:21:38 http2: Framer 0xc4204e8000: read SETTINGS len=6, settings: MAX_CONCURRENT_STREAMS=16
and either way I can do > 100 simultaneous streams on 1 connection
if I remove maxstream on the backend side I really push it (doing 1000 simultaneous streams * 2 connections) I get about 15% errors:
22:33:59 W grpcrunner.go:96> Error making grpc call: rpc error: code = Internal desc = transport: received the unexpected content-type "text/plain"
...
Ended after 17.840324262s : 4000 calls. qps=224.21
Sleep times : count 2000 avg 4.5242081 +/- 1.416 min 1.570214834 max 6.106976232 sum 9048.41625
Aggregated Function Time : count 4000 avg 4.6826156 +/- 2.232 min 1.732992693 max 9.83520712 sum 18730.4624
# range, mid point, percentile, count
>= 1.73299 <= 2 , 1.8665 , 10.28, 411
> 2 <= 3 , 2.5 , 28.68, 736
> 3 <= 4 , 3.5 , 48.15, 779
> 4 <= 5 , 4.5 , 61.98, 553
> 5 <= 7.5 , 6.25 , 83.17, 848
> 7.5 <= 9.83521 , 8.6676 , 100.00, 673
# target 50% 4.13382
# target 75% 6.53597
# target 90% 8.44727
# target 99% 9.69641
# target 99.9% 9.82133
Ping SERVING : 3398
Ping -1 : 602
I get in the envoy ingress logs (no errors on the envoy side car) :
[2018-04-03T05:29:44.084Z] "POST /fgrpc.PingServer/Ping HTTP/2" 503 UO 23 57 6096 - "10.138.0.12" "grpc-go/1.11.1" "cd0cf852-3577-9a57-a19c-d6435302ff2b" "fortio-stage.istio.io:80" "-"
What is UO ?
@ldemailly I'm pretty sure that gRPC xDS client talks directly to selected cluster over HTTP/2, so listener settings (HTTP or TCP proxy) for Pilot wouldn't matter.
As for the difference in SETTINGS received when proxying via Envoy vs connecting directly, those settings are per hop, so that's expected. When you say that you can do >100 simultaneous streams over single connection, you mean single connection to Envoy, correct? Envoy still establishes multiple connections to the backend server (respecting 16 streams per connection)?
From documentation:
UO: Upstream overflow (circuit breaking) in addition to 503 response code.
I'm dealing with this kind of issues using Envoy (Istio) since last week where I'm trying to DDOS my application. https://github.com/istio/istio/issues/4658
Go client (GO 1.10 / gRPC) -TLS- > Envoy (Istio 0.7.0) --> server (Go 1.10 / gRPC)
Trying to DDOS my app, I start some Go client, which in fact each simulate 100 real clients, so 100 TCP cnx to Envoy each.
My client then send a unary request to one endpoint, then open a stream and start sending the payload.
Usually, one client = 1TCP + 1 GET + 1 STREAM.
I see Envoy opening 4 TCP cnx to my server application, proxying every client connections into it.
Which means that, if my gRPC server is setup to allow 250 streams per client, I will be able to handle 1000 clients only (250 streams * 4 TCP connections).
While I'm not sure for the numbers, I can confirm the behaviour.
Setting a higher limit in the server using grpc.MaxConcurrentStreams(math.MaxInt32), mitigated the issue.
My conclusion is that :
config endpoints (RDS...). Currently if the http2.Transport hits SettingsMaxConcurrentStreams for a
server, it just makes a new TCP connection and creates the stream on the
new connection. This CL updates that behavior to instead block RoundTrip
until a new stream is available.
Which means that if Envoy only opens 1 TCP cnx to the server, and if the MaxConcurrentStreams is 250, Envoy will not be able to handle more than 250 clients
I hope I'm wrong, but this is the behaviour I see.
While 100 or 250 is clearly too low, 100.000 sounds around a good value for both gRPC and TCP to use max ressources without exhaustion.
I can't find anything about the real behaviour Envoy should have regarding this
I'm not a dev and have really hard time understanding Envoy C++ code.
Could anyone point us to the right piece of code and maybe explain what is going on in this situation ?
Also, I f my server is set to MaxConcurrentStreams, what will be transmitted to the client ? Is it the same value or the one (which one ?) from the circuit breaker config ?
I also noted something special with the pool of TCP cnx used by Envoy when using gRPC Streams.
DestinationPolicy defined for the the service as : spec:
circuitBreaker:
simpleCb:
httpConsecutiveErrors: 10
httpDetectionInterval: 1s
httpMaxEjectionPercent: 100
httpMaxPendingRequests: 1000
httpMaxRequests: 100000
httpMaxRequestsPerConnection: 500
maxConnections: 100000
sleepWindow: 30s
destination:
name: useredged
curl http://localhost:15000/clusters : out.useredged.dev.svc.cluster.local|grpc::default_priority::max_connections::100000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_pending_requests::1000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_requests::100000
out.useredged.dev.svc.cluster.local|grpc::default_priority::max_retries::3
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_connections::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_pending_requests::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_requests::1024
out.useredged.dev.svc.cluster.local|grpc::high_priority::max_retries::3
Note that the httpMaxRequestsPerConnection parameter is not displayed here...
I'm using our gRPC client to load test. Each client opens 500 TCP connextions to Envoy then start a bidirectional stream and send one message per second.
I'm starting one client at a time, letting all the cx and rq to setup.
In the above setup, everything is fine until a reach the 4000 active requests :
this can be reproduced anytime.
If I increase the httpMaxRequestsPerConnection in the DestinationPolicy to 1000, I will be able to reach 8000 rq before getting 503s.
At the same time I can see hundreds of active cx on a HTTP 1.1 endpoint.
Why can't Envoy scale the number of TCP cnx to the backend server ?
I can now add some more info.
I still don't understand where the issue is.
My issue does not concern EDS or XDS but Envoy proxying from my client to my server.
In this situation, Envoy creates ONE TCP connexion to the upstream server per CPU. In my tests with 2 CPUs --> 2 TCP connexions
Then Envoy will multiplex HTTP/2 gRPC calls into the two TCP cnx.
Here are the scenarios for N clients on servers with 1 CPU :
N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> server
Everything working as expected, can go up to 1024 Streams as per Envoy's default. If I try more, I do get some 503's, which is the expected behaviour
N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> Envoy Sidecar --> 1TCP/N Stream --> server
In fact, what is happening is :
N client --> 1TCP/1Stream each --> Envoy --> 1TCP/100 Stream --> Envoy Sidecar --> 1TCP/100 Stream --> server
Which means if N > 100, I lose connexions.
I can't find anything why the limit is 100, but I can reproduce this situation every time.
If I have 2 CPUs, I see 2 TCP cnx and 100 streams in each...
I'm still digging...
I really think this issue is not linked to https://github.com/istio/istio/issues/4940
see https://github.com/istio/istio/issues/4940#issuecomment-381248731
Maybe I should open a new issue here as it really sound to live inside Envoy ?
@prune998 your issue is a different issue. I know what the issue is. Please delete all your comments from this issue as they are unrelated and open them in a new issue. Thanks.
@mattklein123 I created a new issue https://github.com/envoyproxy/envoy/issues/3076
I would love to here about you if you know what the issue is as it's blocking our deployments.
I don't think I'll delete my comments but maybe state that effectively they are not related to the original issue.
Thanks.
@PiotrSikora can we consider this issue complete? There is a lot in here but I think your commit fixes the original issue here?
AFAIK, this issue (i.e. number of outgoing streams is limited by SETTINGS_MAX_CONCURRENT_STREAMS) isn't fixed yet. I'll consider this fixed once Envoy starts opening new connections to the peer once it hits the SETTINGS_MAX_CONCURRENT_STREAMS limit.
SETTINGS_MAX_CONCURRENT_STREAMS wasn't advertised by the peer, but it doesn't directly address this issue.OK moving to 1.9.0
Doesn't seem like anyone is actively working on this. Mind if I look into implementing this?
@mjpitz I have invited you to envoyproxy org, if you accept I can assign you the issue.
@htuch : Joined
Alright. I spent a fair bit of time diving into the code base.
General Notes:
When primary is swapped to draining:
Implementation proposal:
source/common/http/http2/conn_pool.ccThoughts on this approach? Anything major I'm missing or things I should consider?
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
Sorry I haven't gotten to this yet. I got sidetracked with a presentation. Hopefully I should be able to get to it in the next couple of weeks.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
keepalive
I've been talking myself in circles so it would be nice to talk some of this through some more before implementing.
Since being assigned this ticket, I've been reading up and understanding more about how the Envoy threading model works. This was well captured by the following blog post and tech talk:
A quick TLDR: envoy maintains a pool of workers (configured by --concurrency, default is number of cores on the machine). Each worker maintains its own connection (e.g a thread local connection). To me, this means that the maximum number of concurrent streams would actually be concurrency * MAX_CONCURRENT_STREAMS (assuming even distribution of requests across connections). It would seem that if we need to increase the throughput of the system to get around MAX_CONCURRENT_STREAMS, we could simply modify the --concurrency parameter.
One concern that came to mind was how this impacts the purpose / intent of MAX_CONCURRENT_STREAMS. From what I can gather as far as resources go, it's purpose isn't well documented. As an engineer, I think about this setting as a way for service owners to throttle workloads being performed by their clients. By implementing an "auto-scaling" feature like this, we effectively bypass this setting. I've read a few articles around this and many people are being suggested to work with the service owner to better understand the workload they are trying to perform. Here's one issue under the http2-spec detailing a similar response.
Open Questions:
concurrency * MAX_CONCURRENT_STREAMS true?Ultimately, I've found myself in a personal philosophical debate of "Sure, I could add this feature, but should I?" Getting some clarity around my questions above will help resolve my internal debate.
Does anyone have any insight into the intent / purpose / application of MAX_CONCURRENT_STREAMS?
Afaik, it is the maximum multiplexing of logical grpc streams over a single tcp socket/connection; it's not meant to mean no additional streams should exist between the client and server but just a measure of limitation of throughput and congestion over a single socket (imo)
I know gRPC defaults to one connection per backend returned via name resolver (at least in Java). The google-apis/gax project has a ChannelPool that allows you to increase the number of connections per backend. (just to add another reference to where we're already bypassing this constraint)
@mjpitz it's probably a good idea to distinguish between data and control plane here. For data plane and backends, we will effectively have an upper bound of concurrency * MAX_CONCURRENT_STREAMS. For the control plane, since this only runs on the main thread, we only have MAX_CONCURRENT_STREAMS.
I agree with the idea of making the connection pool support multiple connections. @oschaaf has also been thinking about this in the context of https://github.com/envoyproxy/envoy-perf/tree/master/nighthawk
I agree with the idea of making the connection pool support multiple connections. @oschaaf has also been thinking about this in the context of https://github.com/envoyproxy/envoy-perf/tree/master/nighthawk
Yes, we should definitely do this. There are additional reasons including mitigating head of line blocking in certain cases. I think we might have an issue open specifically on this but I can't remember and I quick search doesn't return anything.
Got an initial patch based on my proposal from back in January. Working on running tests. Here's the initial diff.
@mattklein123 : if we're also looking to mitigate hol-blocking, we probably want to provision the connections ahead of time yeah?
edit: actually.. preemtively establishing connections doesn't seem like it would help much... looks like there are a couple other good solutions though
@mjpitz yeah, pre-creation is another optimization, though IMO I would recommend tracking that under a new/different issue. I've discussed wanting to do this many times with @alyssawilk (who also has thoughts on allowing multiple h2 connections).
Prefetch is also something that @oschaaf and I have discussed in the context of Nighthawk.
So I'm not sure how I thought I was originally going to get the settings info from the h2 conn_pool. Getting deeper into the code, it seems like that's pretty well encapsulated by the client_codec (probably as it should be).
I'm curious what other ideas are floating around so I'll check in on this again tomorrow.
Fixed with recent connection pool changes.
Most helpful comment
I can now add some more info.
I still don't understand where the issue is.
My issue does not concern EDS or XDS but Envoy proxying from my client to my server.
In this situation, Envoy creates ONE TCP connexion to the
upstream serverper CPU. In my tests with 2 CPUs --> 2 TCP connexionsThen Envoy will multiplex HTTP/2 gRPC calls into the two TCP cnx.
Here are the scenarios for N clients on servers with 1 CPU :
one Envoy
N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> serverEverything working as expected, can go up to 1024 Streams as per Envoy's default. If I try more, I do get some 503's, which is the expected behaviour
two Envoy
N client --> 1TCP/1Stream each --> Envoy --> 1TCP/N Stream --> Envoy Sidecar --> 1TCP/N Stream --> serverIn fact, what is happening is :
N client --> 1TCP/1Stream each --> Envoy --> 1TCP/100 Stream --> Envoy Sidecar --> 1TCP/100 Stream --> serverWhich means if N > 100, I lose connexions.
I can't find anything why the limit is 100, but I can reproduce this situation every time.
If I have 2 CPUs, I see 2 TCP cnx and 100 streams in each...
I'm still digging...