Envoy: Large size of http2_protocol_options initial_stream_window_size causes send/read timeout on ALB/nginx upstream on large responses and slow clients

Created on 2 Sep 2019  路  9Comments  路  Source: envoyproxy/envoy

We use Envoy as an edge proxy, in front of an AWS ALB with a default read_timeout of 60 seconds, and were having some reports of clients downloads in browser failing around the 150 MB mark.

Unfortunately the envoy logs just showed:
response_duration: -
response_ttfb: -
flags: -

and a 200 OK

We were able to recreate the problem in browsers by limiting our connection speed to ~1 MB/s.

We noted the problem only occurred with HTTP2.

Looking at Firefox about:networking traces we observed that a HTTP2 rst_stream with 0 error code was being sent:

2019-08-29 22:09:46.217000 UTC - [Parent 12304: Socket Thread]: I/nsHttp Http2Session::RecvRstStream 0000019A3D1FB000 RST_STREAM Reason Code 0 ID 11

We enabled traces on envoy and saw the stream was begin paused and resumed like so:

{"time": "2019-08-30 09:30:48.432", "level": "debug", "type": "http2", "message": "[source/common/http/http2/codec_impl.cc:200] [C213] send buffer under limit "}
{"time": "2019-08-30 09:30:48.432", "level": "debug", "type": "http", "message": "[source/common/http/conn_manager_impl.cc:1607] [C213][] Enabling upstream stream due to downstream stream watermark."}

We did a packet capture on envoy and found that around ~250 MB was read from the ALB then followed by a 60+ second pause where envoy set the receive window to 0 while the client kept reading slowly:

09:29:45.795071 IP alb.443 > envoy.61682: Flags [.], seq 331920955:331922379, ack 1670, win 118, options [nop,nop,TS val 58414411 ecr 2277140227], length 1424
09:29:45.795088 IP alb.443 > envoy.61682: Flags [.], seq 331922379:331924715, ack 1670, win 118, options [nop,nop,TS val 58414411 ecr 2277140227], length 2336
09:29:45.834615 IP envoy.61682 > alb.443: Flags [.], ack 331924715, win 2, options [nop,nop,TS val 2277140419 ecr 58414410], length 0
09:29:46.356500 IP alb.443 > envoy.61682: Flags [P.], seq 331924715:331925739, ack 1670, win 118, options [nop,nop,TS val 58414552 ecr 2277140419], length 1024
09:29:46.356523 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277140941 ecr 58414552], length 0
09:29:46.900379 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58414688 ecr 2277140941], length 0
09:29:46.900397 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277141485 ecr 58414552], length 0
09:29:47.796392 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58414912 ecr 2277141485], length 0
09:29:47.796413 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277142381 ecr 58414552], length 0
09:29:49.428395 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58415320 ecr 2277142381], length 0
09:29:49.428410 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277144013 ecr 58414552], length 0
09:29:52.660486 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58416128 ecr 2277144013], length 0
09:29:52.660501 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277147245 ecr 58414552], length 0
09:29:58.804579 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58417664 ecr 2277147245], length 0
09:29:58.804596 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277153389 ecr 58414552], length 0
09:30:10.836578 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58420672 ecr 2277153389], length 0
09:30:10.836601 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277165421 ecr 58414552], length 0
09:30:35.668812 IP alb.443 > envoy.61682: Flags [.], ack 1670, win 118, options [nop,nop,TS val 58426880 ecr 2277165421], length 0
09:30:35.668827 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 0, options [nop,nop,TS val 2277190253 ecr 58414552], length 0
09:30:48.436329 IP envoy.61682 > alb.443: Flags [.], ack 331925739, win 1265, options [nop,nop,TS val 2277203020 ecr 58414552], length 0
09:30:48.586141 IP alb.443 > envoy.61682: Flags [.], seq 331925739:331927163, ack 1670, win 118, options [nop,nop,TS val 58430109 ecr 2277203020], length 1424
09:30:48.586164 IP envoy.61682 > alb.443: Flags [.], ack 331927163, win 1270, options [nop,nop,TS val 2277203170 ecr 58430109], length 0

As soon as envoy started reading again, once the client had read the buffer down enough, a small (we guess partial http payload) response would arrive from nginx/ALB and then either envoy would Fin the connection or the ALB would.

We observed the following messages:

{"time": "2019-08-30 08:23:09.557", "level": "debug", "type": "client", "message": "[source/common/http/codec_client.cc:118] [C157] protocol error: http/1.1 protocol error: HPE_INVALID_EOF_STATE"}
{"time": "2019-08-30 08:23:09.557", "level": "debug", "type": "client", "message": "[source/common/http/codec_client.cc:82] [C157] disconnect. resetting 1 pending requests"}

Anyways, this is all a long winded way of saying: the large initial_stream_window_size for HTTP2 https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/core/protocol.proto
seems to cause a rather hard to diagnose issue when combined with a slow client and an AWS ALB's default read_timeout.

We suspect this might also cause issues for defaults like nginx's http://nginx.org/en/docs/http/ngx_http_core_module.html#send_timeout which is 60 as well.

We would suggest to cover the most number of clients that the default be lowered to 16 MB, which is where we have ended up. :) Or at least we could help introduce some flags in the logs to explain the connection was finished due to a partial/corrupt HTTP response from Envoy's perspective.

question stale

Most helpful comment

See https://github.com/envoyproxy/envoy/pull/8001 where we are working on documenting more appropriate edge configurations.

All 9 comments

initial_stream_window_size is documented here https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/protocol.proto#envoy-api-field-core-http2protocoloptions-initial-connection-window-size and the relevant default size is here https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/core/protocol.proto

  // `Initial stream-level flow-control window
  // <https://httpwg.org/specs/rfc7540.html#rfc.section.6.9.2>`_ size. Valid values range from 65535
  // (2^16 - 1, HTTP/2 default) to 2147483647 (2^31 - 1, HTTP/2 maximum) and defaults to 268435456
  // (256 * 1024 * 1024).

We resolved this issue in two ways which can be used independently or together:

  1. We raised the ALB timeout (or nginx if you are speaking to some other proxy type) from the default 60 to 300s+, during which time clients are more likely to read out enough data to cause Envoy's buffer to drop below a watermark needed to trigger further read events and thus reset the cluster/upstreams read timeout.

  2. We lowered the initial_stream_window_size to something more sane like

            http2_protocol_options:
              initial_stream_window_size: 16777216

The added benefit here is we experience more consistent and lower memory usage, while we have not observed any performance degradation even on high speed clients.

See https://github.com/envoyproxy/envoy/pull/8001 where we are working on documenting more appropriate edge configurations.

@bmgoau should we close this out in favor of #8001?

@htuch cc @mattklein123 If this was closed in favor of #8001 I think we'd be cool, but before you do, please consider the following and let me know your thoughts:

  1. It is likely that envoy will be used more and more in conjunction with nginx and ALBs (i mean, we all know it is already 馃槃 ) and other servers and proxies with default send_timeouts of 60s. And continue to gain popularity in a huge variety of use cases.
  2. Large responses >250 MB are becoming more common in all environments.
  3. 1-2 MB/s (8-16 Mbit/s) is not an uncommon speed among Broadband clients and especially mobile clients (even in North America, let alone globally). And those speeds are also very common in industrial settings and older management bands to remote equipment. They're slow, ill admit, but not _THAT_ slow.
  4. Just as in #6727 constraining initial_stream_window_size has also decreased the number of sudden spikes in memory usage among our envoy instances: internal and edge proxies.

Taken altogether, we feel there is a case to be made for lowering the default size of initial_stream_window_size from 256 MB down to our current 16MB (or possibly even 1 MB) out of the box.

Since this issue was so easy to recreate with a default nginx or ALB config and basic envoy configuration, and a slow - but not _stupidly_ slow - client download we feel like this is a bit of a bug of _unfriendly defaults_, rather than just a document bug.

We understand that you, and the community, cannot make envoy suit 100% of use cases out of the box, and we wouldn't expect that at all. But we feel like the use case of envoy + >256 MB download + ALB/nginx (or really any server with a 60s send_timeout) + 1-2 MB/s client bandwidth is sufficiently common (or will become common) and not so esoteric, that it makes sense to adjust the default initial_stream_window_size down to say 1MB or 16 MB by default. To save people from themselves if you will... 馃槢

I'm not 100% sure where the low water mark is, but if it's around 50% then I think the math works out that with initial_stream_window_size=16MB and default nginx/haproxy/nginx config a client would need to be slower than 133 KB/s or 1 Mbit/s to trigger the issue we observed, which is several deviations below the median speed we see among our internet clients. (For reference our p50 measured speed of internet clients is 14 Mbit/s, dipping to ~5 Mbits in some parts of Africa, Asia and Oceania even with local well connected datacenters).

We're coming from a position of thousands of envoy instances, in many different scenarios (public edge, s2s, front-proxy, service-proxy, etc etc), 200M requests per day, 4 TB transferred per day.

Anyways, please consider the above in the spirit of upmost respect for your work, the project and shared positive intent.

Expect to see more code and doc contributions from us going forward - and not just issues haha.

Edit: I believe most haproxy client and server send timeouts are 60s by default too.

Sorry, one last comment:
In all the situations where we saw HPE_INVALID_EOF_STATE we also saw %RESPONSE_FLAGS% "-".

In situations where the response is 200 OK, but the server/proxy sends a partial response due to its send_timeout expiring, and thus envoy detects a protocol error (see debugs in my initial comment) and sends a RST_STREAM, it might be useful to have a flag show this:

200 OK
RESPONSE_FLAGS =

PR - partial response?
HI - HTTP response payload invalid?
H2R - we sent a http2 rst_stream?
SL - server response invalid length

I'm throwing out ideas here as you can tell.

Basically something to make these kinds of failures visible above the level of debug/trace.

(There are already metrics for flow control and http rst, we're talking about logs)

Also, I want to note it was our fault for relying on the defaults here instead of carefully adjusting them to our use case. We recognize that fully, but still humbly submit the above suggestions.

@bmgoau thanks for the detailed comments. We are definitely considering changing the defaults in future releases, and potentially having "profiles" but this is a really tricky topic as its impossible to make everyone happy.

re: the additional response flags if you could open a specific feature request issue on that it would be great. Thank you!

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hzxuzhonghu picture hzxuzhonghu  路  3Comments

boncheo picture boncheo  路  3Comments

sabiurr picture sabiurr  路  3Comments

jmillikin-stripe picture jmillikin-stripe  路  3Comments

jeremybaumont picture jeremybaumont  路  3Comments