Description:
Hello! 👋
We recently realised our outbound traffic had increased a fair amount sinced we switched from serving some assets directly from Akamai to serving them from Ambassador, and traced it down to our gzipped assets being much larger.
I then ran some tests, and it appears that gzipping content in Envoy results in fairly poor compression results, even when tweaking the settings to produce the smallest possible output.
For example, when gzipping html & js from Youtube (picked as a random "big website" example), Envoy-compressed html is 12% larger than nginx and js is 13% larger.
This isn't nginx-specific either: I've tested gzip locally, Apache httpd, and nginx, and all of their results are within a few percents of each other while Envoy is consistently over 10% larger.
Relevant Links:
Here is my testing methodology: https://github.com/Pluies/gzip-comparison
And the relevant Envoy configuration files are:
Is this a known issue? Is there a problem with my configuration? I'm by no means an Envoy expert, so please let me know if there's any obvious issues with the config above or anything else I've missed that might explain this discrepancy.
Thanks a lot!
@dio @lizan @gsagula
@Pluies Do you know what is the delay when compressing with Nginx and Js? In other to reduce latency we decided to compress and flush on each data frame. The downside is that it degrades the compression ratio. The second option is to let gzip to compress until the buffer gets full. I think we could make it configurable.
Do you know what is the delay when compressing with Nginx and Js?
That's a very good question; I guess we could try and measure it with the compose-based stuff. I'll give it a go and report back!
As far as I can tell, compressing in small chunks this way is something only envoy does – httpd, nginx, and Akamai (the one's I've tested so far :) ) all seem to wait until the file is fully available before compressing, if I understand correctly – resulting in much better results. Having it optional would be a great way of letting users choose latency or compression performance, as the current results are a bit baffling.
By the way, by data frame, do you refer exclusively to HTTP/2 streams? (Sorry if it's a silly question, probably need to brush up on my HTTP knowledge).
OK, I've been running tests on a the polymer js file duplicated 10 times, bringing it to 30MB for an extreme test case, with the following curl command: curl -so /dev/null http://localhost:$PORT/$FILE --compressed -w '%{size_download}, Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total}\n' (as per this link).
This hopefully should test TTFB, which (once again, if I understand correctly) should be much lower in Envoy than in other servers.
I'm getting a fair amount of variance run-after-run, so please don't read too much in this test, but here are some numbers:
Comparing compression performance for file: assets/desktop_polymer_v2_x10.js (all sizes in bytes)
origin, uncompressed 31668307, Connect: 0.005557 TTFB: 0.020001 Total time: 0.563083
nginx, base 9584605, Connect: 0.005483 TTFB: 0.017597 Total time: 0.620351
nginx, optimised 7821304, Connect: 0.004427 TTFB: 0.021322 Total time: 1.782090
ambassador, base 10881198, Connect: 0.004422 TTFB: 0.022827 Total time: 1.050829
ambassador, optimised 10864866, Connect: 0.004758 TTFB: 0.025497 Total time: 1.316463
envoy, base 10881083, Connect: 0.005510 TTFB: 0.021910 Total time: 0.899159
envoy, optimised 8831289, Connect: 0.004534 TTFB: 0.027110 Total time: 1.713541
apache, base 7853115, Connect: 0.005130 TTFB: 0.019696 Total time: 1.157169
apache, optimised 7820515, Connect: 0.004526 TTFB: 0.023279 Total time: 2.212370
local gzip, base 7854014
local gzip, optimised 7821330
As expected, we're seeing a fair uptick in the total time between "base" and "optimised" version of the same server (as we're trading overall speed for data size), but I'm seeing basically no difference in TTFB between servers... So, other HTTP servers are also compressing-while-streaming, but somehow in an efficient fashion? Or is curl not telling us the full story? (e.g. headers being sent first while the body is being compressed, or other shenanigans)
It looks like that Envoy and Nginx optimized/base are very close:
nginx, base 9584605, Connect: 0.005483 TTFB: 0.017597 Total time: 0.620351
envoy, base 10881083, Connect: 0.005510 TTFB: 0.021910 Total time: 0.899159
nginx, optimised 7821304, Connect: 0.004427 TTFB: 0.021322 Total time: 1.782090
envoy, optimised 8831289, Connect: 0.004534 TTFB: 0.027110 Total time: 1.713541
It could be also the underlying compression library that we are using. I'm looking at some other more modern libraries that may improve the performance.
@gsagula thanks for looking into it! :)
As you point out, the difference between optimised {nginx,apache,gzip} and optimised Envoy is not _huge_... But still ~10-15% extra, which wipes out gains from a bunch of other application-level improvements we've been working on at $CURRENT_COMPANY, for example.
Regarding the compression library, as far as I can understand, both envoy and nginx use the venerable zlib? Here's the ngx_http_gzip_filter_module.c source, which refers to the exact same parameters (gzip_comp_level, gzip_window), same function calls (deflateInit2), importing zlib.h, etc.
On a related note, we are also hit by this (getting worse latency via Envoy than Nginx when compressing). I wrote #10464, as a starting point to start measuring things.
I wrote #10464, as a starting point to start measuring things.
Perf annotations and flamegraphs could also help debugging
Ok, deleted my last comment because it was inaccurate -- the problem was with the mocks. Getting rid of those and I get back ~6ms. I'll keep removing the mocks/boilerplate to get more accurate measurements.
Adding stats to track flushes: #10500.
Ok, so started running with the stats added in #10500 and the ratio of flush vs finish vs upstream_rq seems potentially concerning:

I am wondering if the change introduced in #3025 -- which was a fix for #2909 -- didn't mean to use Z_NO_FLUSH instead of Z_SYNC_FLUSH in https://github.com/envoyproxy/envoy/blob/master/source/common/compressor/zlib_compressor_impl.cc#L40:
void ZlibCompressorImpl::compress(Buffer::Instance& buffer, State state) {
for (const Buffer::RawSlice& input_slice : buffer.getRawSlices()) {
zstream_ptr_->avail_in = input_slice.len_;
zstream_ptr_->next_in = static_cast<Bytef*>(input_slice.mem_);
// Z_NO_FLUSH tells the compressor to take the data in and compresses it as much as possible
// without flushing it out. However, if the data output is greater or equal to the allocated
// chunk size, process() outputs it to the end of the buffer. This is fine, since at the next
// step, the buffer is drained from the beginning of the buffer by the size of input.
process(buffer, Z_NO_FLUSH);
buffer.drain(input_slice.len_);
}
// Was this meant to be Z_NO_FLUSH? Why should we always flush?
process(buffer, state == State::Finish ? Z_FINISH : Z_SYNC_FLUSH);
}
Sounds like maybe we went a step to far in fixing the missing of Z_FINISH call...
cc: @dio @gsagula
@rgs1 thanks for this. I'd love to check on the potential regression (manual check if required) when you have something to test.
Also related to perf (but still doesn't address the excessive flush issue): https://github.com/envoyproxy/envoy/pull/10508.
Ok, first attempt at reducing the flush calls: #10518.
So #10518 doesn't appear to move the needle from my initial tests....
FYI #10530 for better instrumentation.
So I have two setups with the following results to compare:
Setup 1: nginx w/ gzip --> ~150k JSON blob on disc
Results:
wrk -t1 -c1 -d1m -R1 -H 'Accept-Encoding: gzip' http://127.0.0.1/static/blob-156.json
Running 1m test @ http://127.0.0.1/static/blob-156.json
1 threads and 1 connections
Thread calibration: mean lat.: 5.709ms, rate sampling interval: 12ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.14ms 464.35us 6.26ms 60.00%
Req/Sec 1.05 9.50 90.00 98.78%
61 requests in 1.00m, 4.98MB read
Requests/sec: 1.02
Transfer/sec: 85.05KB
Setup 2: envoy w/ gzip -->nginx no gzip --> ~150k JSON blob on disc
Results:
wrk -t1 -c1 -d1m -R1 -H 'Accept-Encoding: gzip' http://127.0.0.1:81/plain/blob-156.json
Running 1m test @ http://127.0.0.1:81/plain/blob-156.json
1 threads and 1 connections
Thread calibration: mean lat.: 11.784ms, rate sampling interval: 24ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.20ms 359.84us 12.33ms 64.00%
Req/Sec 1.01 6.39 43.00 97.58%
60 requests in 1.00m, 4.94MB read
Requests/sec: 1.00
Transfer/sec: 110.87KB
Note: getting the non-compressed JSON blob from nginx directly has a ~1ms avg latency.
With #10676 applied, transfer/sec (response size) does go down from 110KB to 84KB -- so that's definitely an improvement.
However, Envoy still takes 2x what nginx takes to serve an 150kb compressed JSON blob.
cc: @mattklein123 @rojkov
** to generate the JSON blob I used https://gist.github.com/rgs1/d9b62f827d941d20c1ed507078d1334a:
$ gen-json-blob.py 5000 > blob.json
$ ls -lah blob.json
-rw-r--r-- 1 rgs rgs 156K Apr 7 10:51 blob.json
Does NGINX cache the compressed content before serving?
Does NGINX cache the compressed content before serving?
It does not -- no header saying so, _and_ with perf top I can see zlib running again and again...
Can you try putting an NGINX in front of an NGINX for a more similar experiment? My suspicion is that because NGINX as acting as an origin server it is compressing and writing larger chunks of data.
Also, with the compression fix, is the final compressed size similar now?
Can you try putting an NGINX in front of an NGINX for a more similar experiment? My suspicion is that because NGINX as acting as an origin server it is compressing and writing larger chunks of data.
Sure.
Also, with the compression fix, is the final compressed size similar now?
Yep! Progress.
Setup 3: nginx w/ gzip -->nginx no gzip --> ~150k JSON blob on disc
Results:
$ wrk -t1 -c1 -d1m -R1 -H 'Accept-Encoding: gzip' http://127.0.0.1:81/plain/blob-156.json
Running 1m test @ http://127.0.0.1:81/plain/blob-156.json
1 threads and 1 connections
Thread calibration: mean lat.: 5.723ms, rate sampling interval: 12ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.93ms 443.72us 6.06ms 64.00%
Req/Sec 1.03 9.30 90.00 98.78%
61 requests in 1.00m, 4.98MB read
Requests/sec: 1.02
Transfer/sec: 85.06KB
@mattklein123: daisy chaining two nginxs yields the same result as hitting the origin nginx directly, no extra penalty (even slightly faster... !).
Hmm, OK. It's hard for me to understand how Envoy could be doing this 2x slower. Do you know if NGINX does any buffering during compression? Given the repro you have perhaps @rojkov might have time to look into this.
I wonder if #10518 would help now that we are using the right strategy...
@rgs1 yes possible if you want to try that.
Hmm, OK. It's hard for me to understand how Envoy could be doing this 2x slower. Do you know if NGINX does any buffering during compression? Given the repro you have perhaps @rojkov might have time to look into this.
Yeah I was gonna dive into nginx's compression process next to see how much it buffers. Also, as mentioned above, #10518 might help now.
Alas, #10518 makes no difference:
$ wrk -t1 -c1 -d1m -R1 -H 'Accept-Encoding: gzip' http://127.0.0.1:81/plain/blob-156.json
Running 1m test @ http://127.0.0.1:81/plain/blob-156.json
1 threads and 1 connections
Thread calibration: mean lat.: 11.601ms, rate sampling interval: 24ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.24ms 561.29us 12.18ms 52.00%
Req/Sec 1.01 6.45 43.00 97.58%
60 requests in 1.00m, 4.94MB read
Requests/sec: 1.00
Transfer/sec: 84.21KB
@rgs1 OK nothing off the top of my head. Can you link to the NGINX gzip impl?
@mattklein123 here: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_gzip_filter_module.c
@mattklein123 one thing that I just checked -- and matches with the benchmark that I am adding in #10464 -- is encodeData() in the gzip/compressor filter is called just twice:
1) with a Buffer::Instance of 160kb (the total size of the JSON blob, uncompressed)
2) with a Buffer of 0 bytes (end of stream)
And per my benchmark, calling the compressor with a 100kb buffer at once is > 12ms... so that matches the expectation.
My question is: shouldn't encodeData() be called with 160/16 times -- each time with a buffer of + 16kb per https://github.com/envoyproxy/envoy/blob/master/source/common/network/raw_buffer_socket.cc#L21 ?
I assume upstream is HTTP/1, right? See https://github.com/envoyproxy/envoy/pull/10406. cc @antoniovicente.
I'm not a compression expert, but if this type of bulk compression is causing an issue we would need to figure out how to do some kind of max chunk. With that said, it seems like all the data has to get compressed one way or the other so are we talking about measure TTFB or TTLB?
I assume upstream is HTTP/1, right? See #10406. cc @antoniovicente.
Yes -- it's an HTTP/1 upstream. I also tried reverting @antoniovicente's change to buffer the response body, no change.
I'm not a compression expert, but if this type of bulk compression is causing an issue we would need to figure out how to do some kind of max chunk. With that said, it seems like all the data has to get compressed one way or the other so are we talking about measure TTFB or TTLB?
TTLB.
From offline convo I think we are not hitting the "should drain buffer" limit in the raw buffer socket. I mentioned to @rgs1 to hard code this to 16 or 32K and let's see if that is the issue and then if so we can discuss what to do about it.
Ok, so I am trying out this diff:
diff --git a/source/common/compressor/zlib_compressor_impl.h b/source/common/compressor/zlib_compressor_impl.h
index 7db1b9b73..b98bc9f3d 100644
--- a/source/common/compressor/zlib_compressor_impl.h
+++ b/source/common/compressor/zlib_compressor_impl.h
@@ -50,7 +50,7 @@ public:
Filtered = 1,
Huffman = 2,
Rle = 3,
- Standard = 4,
+ Standard = 0,
};
/**
diff --git a/source/common/network/raw_buffer_socket.cc b/source/common/network/raw_buffer_socket.cc
index 7191c174a..b34f53ff4 100644
--- a/source/common/network/raw_buffer_socket.cc
+++ b/source/common/network/raw_buffer_socket.cc
@@ -29,7 +29,7 @@ IoResult RawBufferSocket::doRead(Buffer::Instance& buffer) {
break;
}
bytes_read += result.rc_;
- if (callbacks_->shouldDrainReadBuffer()) {
+ if (bytes_read >= 16384) {
callbacks_->setReadBufferReady();
break;
}
(Btw -- shouldDrainReadBuffer() can also be triggered by setting the cluster's per_connection_buffer_limit property to 16k).
And here's a trace level log, you can see that the filter's encodeData is called 11 times.
And it's still taking ~12ms vs 4ms in nginx.
Per my synthetic benchmark, we should be able to compress ~120kb in ~2.5ms with 16kb chunks. Not sure why we can't hit that yet -- could be time spent in the HTTP parser or waking up from the event loop (none of those are factored in in my synthetic benchmark -- #10464).
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C7] parsing 118 bytes
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:672] [C7] message begin
[2020-04-07 22:07:49.464][12098][debug][http] [source/common/http/conn_manager_impl.cc:268] [C7] new stream
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C7] completed header: key=Host value=localhost:81
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C7] completed header: key=User-Agent value=curl/7.58.0
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C7] completed header: key=Accept value=*/*
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:573] [C7] onHeadersCompleteBase
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C7] completed header: key=Accept-Encoding value=gzip
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:771] [C7] Server: onHeadersComplete size=4
[2020-04-07 22:07:49.464][12098][trace][http] [source/common/http/http1/codec_impl.cc:649] [C7] message complete
[2020-04-07 22:07:49.464][12098][debug][http] [source/common/http/conn_manager_impl.cc:777] [C7][S925878110003680490] request headers complete (end_stream=true):
':authority', 'localhost:81'
':path', '/plain/blob-156.json'
':method', 'GET'
'user-agent', 'curl/7.58.0'
'accept', '*/*'
'accept-encoding', 'gzip'
[2020-04-07 22:07:49.464][12098][debug][http] [source/common/http/conn_manager_impl.cc:1329] [C7][S925878110003680490] request end stream
[2020-04-07 22:07:49.465][12098][trace][http] [source/common/http/conn_manager_impl.cc:1038] [C7][S925878110003680490] decode headers called: filter=0x563a55f0d080 status=0
[2020-04-07 22:07:49.465][12098][debug][router] [source/common/router/router.cc:477] [C7][S925878110003680490] cluster 'nginx' match for URL '/plain/blob-156.json'
[2020-04-07 22:07:49.465][12098][debug][router] [source/common/router/router.cc:634] [C7][S925878110003680490] router decoding headers:
':authority', 'localhost:81'
':path', '/plain/blob-156.json'
':method', 'GET'
':scheme', 'http'
'user-agent', 'curl/7.58.0'
'accept', '*/*'
'accept-encoding', 'gzip'
'x-forwarded-proto', 'http'
'x-request-id', 'f8ddbc3c-e4d7-4645-84ed-852b6c7b609b'
'x-envoy-expected-rq-timeout-ms', '15000'
[2020-04-07 22:07:49.465][12098][debug][pool] [source/common/http/conn_pool_base.cc:337] queueing request due to no available connections
[2020-04-07 22:07:49.465][12098][debug][pool] [source/common/http/conn_pool_base.cc:47] creating a new connection
[2020-04-07 22:07:49.465][12098][debug][client] [source/common/http/codec_client.cc:34] [C8] connecting
[2020-04-07 22:07:49.465][12098][debug][connection] [source/common/network/connection_impl.cc:727] [C8] connecting to 127.0.0.1:80
[2020-04-07 22:07:49.465][12098][debug][connection] [source/common/network/connection_impl.cc:736] [C8] connection in progress
[2020-04-07 22:07:49.465][12098][trace][http] [source/common/http/conn_manager_impl.cc:1038] [C7][S925878110003680490] decode headers called: filter=0x563a55f0d020 status=1
[2020-04-07 22:07:49.465][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C7] parsed 118 bytes
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:315] [C7] readDisable: enabled=true disable=true state=0
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 2
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C8] write ready
[2020-04-07 22:07:49.465][12098][debug][connection] [source/common/network/connection_impl.cc:592] [C8] connected
[2020-04-07 22:07:49.465][12098][debug][client] [source/common/http/codec_client.cc:72] [C8] connected
[2020-04-07 22:07:49.465][12098][debug][pool] [source/common/http/conn_pool_base.cc:143] [C8] attaching to next request
[2020-04-07 22:07:49.465][12098][debug][pool] [source/common/http/conn_pool_base.cc:68] [C8] creating stream
[2020-04-07 22:07:49.465][12098][debug][router] [source/common/router/upstream_request.cc:317] [C7][S925878110003680490] pool ready
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C8] writing 253 bytes, end_stream false
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C8] write ready
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C8] write returns: 253
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 2
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 2
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C8] write ready
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 3
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C8] write ready
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.465][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:672] [C8] message begin
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Server value=nginx/1.17.10
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Date value=Tue, 07 Apr 2020 22:07:49 GMT
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Content-Type value=application/json
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Content-Length value=160003
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Last-Modified value=Sun, 05 Apr 2020 19:35:40 GMT
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Connection value=keep-alive
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=ETag value="5e8a330c-27103"
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:573] [C8] onHeadersCompleteBase
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:433] [C8] completed header: key=Accept-Ranges value=bytes
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/http1/codec_impl.cc:986] [C8] Client: onHeadersComplete size=8
[2020-04-07 22:07:49.466][12098][debug][router] [source/common/router/router.cc:1149] [C7][S925878110003680490] upstream headers complete: end_stream=false
[2020-04-07 22:07:49.466][12098][trace][http] [source/common/http/conn_manager_impl.cc:1561] [C7][S925878110003680490] encode headers called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.466][12098][debug][http] [source/common/http/conn_manager_impl.cc:1702] [C7][S925878110003680490] encoding headers via codec (end_stream=false):
':status', '200'
'server', 'envoy'
'date', 'Tue, 07 Apr 2020 22:07:49 GMT'
'content-type', 'application/json'
'last-modified', 'Sun, 05 Apr 2020 19:35:40 GMT'
'accept-ranges', 'bytes'
'x-envoy-upstream-service-time', '0'
'vary', 'Accept-Encoding'
'content-encoding', 'gzip'
[2020-04-07 22:07:49.466][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 280 bytes, end_stream false
[2020-04-07 22:07:49.467][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.467][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8379 end_stream=false)
[2020-04-07 22:07:49.467][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8387 bytes, end_stream false
[2020-04-07 22:07:49.467][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.467][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.467][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.467][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.467][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.468][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.468][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8789 end_stream=false)
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8797 bytes, end_stream false
[2020-04-07 22:07:49.468][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 2
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C7] write returns: 17464
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.468][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.468][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.469][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.469][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8783 end_stream=false)
[2020-04-07 22:07:49.469][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8791 bytes, end_stream false
[2020-04-07 22:07:49.469][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.469][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.469][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.469][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.469][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.470][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.470][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8805 end_stream=false)
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8813 bytes, end_stream false
[2020-04-07 22:07:49.470][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 2
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C7] write returns: 17604
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 251
[2020-04-07 22:07:49.470][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.470][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16635 bytes
[2020-04-07 22:07:49.471][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.471][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8913 end_stream=false)
[2020-04-07 22:07:49.471][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8921 bytes, end_stream false
[2020-04-07 22:07:49.471][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16635 bytes
[2020-04-07 22:07:49.471][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.471][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.471][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.471][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.472][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.472][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8796 end_stream=false)
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8804 bytes, end_stream false
[2020-04-07 22:07:49.472][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 2
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C7] write returns: 17725
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.472][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.472][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.473][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.473][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8781 end_stream=false)
[2020-04-07 22:07:49.473][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8789 bytes, end_stream false
[2020-04-07 22:07:49.473][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.473][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.473][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.473][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.473][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.475][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.475][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8778 end_stream=false)
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8786 bytes, end_stream false
[2020-04-07 22:07:49.475][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 2
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C7] write returns: 17575
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.475][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 16384
[2020-04-07 22:07:49.475][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 16384 bytes
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=8803 end_stream=false)
[2020-04-07 22:07:49.476][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 8811 bytes, end_stream false
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 16384 bytes
[2020-04-07 22:07:49.476][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 1
[2020-04-07 22:07:49.476][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C8] read ready
[2020-04-07 22:07:49.476][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C8] read returns: 12547
[2020-04-07 22:07:49.476][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:39] [C8] read error: Resource temporarily unavailable
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/http1/codec_impl.cc:470] [C8] parsing 12547 bytes
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/http1/codec_impl.cc:649] [C8] message complete
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=6744 end_stream=false)
[2020-04-07 22:07:49.476][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 6752 bytes, end_stream false
[2020-04-07 22:07:49.476][12098][trace][http] [source/common/http/http1/codec_impl.cc:1019] [C8] message complete
[2020-04-07 22:07:49.476][12098][debug][client] [source/common/http/codec_client.cc:104] [C8] response complete
[2020-04-07 22:07:49.476][12098][trace][main] [source/common/event/dispatcher_impl.cc:163] item added to deferred deletion list (size=1)
[2020-04-07 22:07:49.477][12098][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C7][S925878110003680490] encode data called: filter=0x563a55f87400 status=0
[2020-04-07 22:07:49.477][12098][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C7][S925878110003680490] encoding data via codec (size=10 end_stream=true)
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:429] [C7] writing 20 bytes, end_stream false
[2020-04-07 22:07:49.477][12098][trace][main] [source/common/event/dispatcher_impl.cc:163] item added to deferred deletion list (size=2)
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:315] [C7] readDisable: enabled=false disable=false state=0
[2020-04-07 22:07:49.477][12098][debug][pool] [source/common/http/http1/conn_pool.cc:48] [C8] response complete
[2020-04-07 22:07:49.477][12098][debug][pool] [source/common/http/conn_pool_base.cc:93] [C8] destroying stream: 0 remaining
[2020-04-07 22:07:49.477][12098][trace][http] [source/common/http/http1/codec_impl.cc:497] [C8] parsed 12547 bytes
[2020-04-07 22:07:49.477][12098][trace][main] [source/common/event/dispatcher_impl.cc:79] clearing deferred deletion list (size=2)
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C8] socket event: 2
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C8] write ready
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 2
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:68] [C7] write returns: 15583
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:493] [C7] socket event: 3
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:581] [C7] write ready
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/connection_impl.cc:531] [C7] read ready
[2020-04-07 22:07:49.477][12098][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C7] read returns: 0
[2020-04-07 22:07:49.477][12098][debug][connection] [source/common/network/connection_impl.cc:558] [C7] remote close
[2020-04-07 22:07:49.477][12098][debug][connection] [source/common/network/connection_impl.cc:200] [C7] closing socket: 0
[2020-04-07 22:07:49.477][12098][debug][conn_handler] [source/server/connection_handler_impl.cc:86] [C7] adding to cleanup list
[2020-04-07 22:07:49.477][12098][trace][main] [source/common/event/dispatcher_impl.cc:163] item added to deferred deletion list (size=1)
[2020-04-07 22:07:49.477][12098][trace][main] [source/common/event/dispatcher_impl.cc:163] item added to deferred deletion list (size=2)
[2020-04-07 22:07:49.477][12098][trace][main] [source/common/event/dispatcher_impl.cc:79] clearing deferred deletion list (size=2)
@rgs1 keep in mind that when you delay the read loop, it initiates a synthetic event to read later. So there may be significant delay. I would probably add some log lines at warn level just logging the time of the encode path and I'm guessing there are 1ms+ apart which would add up to the delta.
The real (and potentially complicated) solution here might be to manually chunk compress the buffer when it is above a certain size.
I also took a trace when reading all the upstream's response in one go and it happens in like ~2ms. So not too bad. The problem is it then takes 9ms to compress() it all. So maybe what we want is:
a) read as much as you can [ that is, maintain the current read loop as is ]
b) chunk things up for compression
[2020-04-07 22:40:10.798][12935][debug][connection] [source/common/network/connection_impl.cc:727] [C2] connecting to 127.0.0.1:80
[2020-04-07 22:40:10.798][12935][debug][connection] [source/common/network/connection_impl.cc:736] [C2] connection in progress
[2020-04-07 22:40:10.799][12935][trace][http] [source/common/http/conn_manager_impl.cc:1038] [C1][S12907644501072272853] decode headers called: filter=0x55e185550d20 status=1
[2020-04-07 22:40:10.799][12935][trace][http] [source/common/http/http1/codec_impl.cc:497] [C1] parsed 118 bytes
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:315] [C1] readDisable: enabled=true disable=true state=0
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:493] [C2] socket event: 2
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:581] [C2] write ready
[2020-04-07 22:40:10.799][12935][debug][connection] [source/common/network/connection_impl.cc:592] [C2] connected
[2020-04-07 22:40:10.799][12935][debug][client] [source/common/http/codec_client.cc:72] [C2] connected
[2020-04-07 22:40:10.799][12935][debug][pool] [source/common/http/conn_pool_base.cc:143] [C2] attaching to next request
[2020-04-07 22:40:10.799][12935][debug][pool] [source/common/http/conn_pool_base.cc:68] [C2] creating stream
[2020-04-07 22:40:10.799][12935][debug][router] [source/common/router/upstream_request.cc:317] [C1][S12907644501072272853] pool ready
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:429] [C2] writing 253 bytes, end_stream false
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:581] [C2] write ready
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:69] [C2] write returns: 253
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:493] [C1] socket event: 2
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:581] [C1] write ready
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:493] [C2] socket event: 2
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:581] [C2] write ready
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:493] [C2] socket event: 3
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:581] [C2] write ready
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/connection_impl.cc:531] [C2] read ready
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.799][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 251
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 16384
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:25] [C2] read returns: 12547
[2020-04-07 22:40:10.800][12935][trace][connection] [source/common/network/raw_buffer_socket.cc:40] [C2] read error: Resource temporarily unavailable
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:470] [C2] parsing 160254 bytes
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:672] [C2] message begin
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Server value=nginx/1.17.10
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Date value=Tue, 07 Apr 2020 22:40:10 GMT
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Content-Type value=application/json
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Content-Length value=160003
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Last-Modified value=Sun, 05 Apr 2020 19:35:40 GMT
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Connection value=keep-alive
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=ETag value="5e8a330c-27103"
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:573] [C2] onHeadersCompleteBase
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:433] [C2] completed header: key=Accept-Ranges value=bytes
[2020-04-07 22:40:10.800][12935][trace][http] [source/common/http/http1/codec_impl.cc:986] [C2] Client: onHeadersComplete size=8
[2020-04-07 22:40:10.800][12935][debug][router] [source/common/router/router.cc:1149] [C1][S12907644501072272853] upstream headers complete: end_stream=false
[2020-04-07 22:40:10.801][12935][trace][http] [source/common/http/conn_manager_impl.cc:1561] [C1][S12907644501072272853] encode headers called: filter=0x55e1855da0a0 status=0
[2020-04-07 22:40:10.801][12935][debug][http] [source/common/http/conn_manager_impl.cc:1702] [C1][S12907644501072272853] encoding headers via codec (end_stream=false):
':status', '200'
'server', 'envoy'
'date', 'Tue, 07 Apr 2020 22:40:10 GMT'
'content-type', 'application/json'
'last-modified', 'Sun, 05 Apr 2020 19:35:40 GMT'
'accept-ranges', 'bytes'
'x-envoy-upstream-service-time', '1'
'vary', 'Accept-Encoding'
'content-encoding', 'gzip'
[2020-04-07 22:40:10.801][12935][trace][connection] [source/common/network/connection_impl.cc:429] [C1] writing 280 bytes, end_stream false
[2020-04-07 22:40:10.801][12935][trace][http] [source/common/http/http1/codec_impl.cc:649] [C2] message complete
[2020-04-07 22:40:10.810][12935][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C1][S12907644501072272853] encode data called: filter=0x55e1855da0a0 status=0
[2020-04-07 22:40:10.810][12935][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C1][S12907644501072272853] encoding data via codec (size=85937 end_stream=false)
[2020-04-07 22:40:10.810][12935][trace][connection] [source/common/network/connection_impl.cc:429] [C1] writing 85946 bytes, end_stream false
[2020-04-07 22:40:10.810][12935][trace][http] [source/common/http/http1/codec_impl.cc:1019] [C2] message complete
[2020-04-07 22:40:10.810][12935][debug][client] [source/common/http/codec_client.cc:104] [C2] response complete
[2020-04-07 22:40:10.810][12935][trace][main] [source/common/event/dispatcher_impl.cc:163] item added to deferred deletion list (size=1)
[2020-04-07 22:40:10.811][12935][trace][http] [source/common/http/conn_manager_impl.cc:1823] [C1][S12907644501072272853] encode data called: filter=0x55e1855da0a0 status=0
[2020-04-07 22:40:10.811][12935][trace][http] [source/common/http/conn_manager_impl.cc:1849] [C1][S12907644501072272853] encoding data via codec (size=10 end_stream=true)
[2020-04-07 22:40:10.811][12935][trace][connection] [source/common/network/connection_impl.cc:429] [C1] writing 20 bytes, end_stream false
[2020-04-07 22:40:10.811][12935][trace][main] [source/common/event/dispatcher_impl.cc:163] item added to deferred deletion list (size=2)
[2020-04-07 22:40:10.811][12935][trace][connection] [source/common/network/connection_impl.cc:315] [C1] readDisable: enabled=false disable=false state=0
The real (and potentially complicated) solution here might be to manually chunk compress the buffer when it is above a certain size.
Yes -- it seems like it...
b) chunk things up for compression
Yes exactly (if this is indeed the problem). Does NGINX have a max chunk size?
b) chunk things up for compression
Yes exactly (if this is indeed the problem). Does NGINX have a max chunk size?
Not sure -- read their filter but didn't grok it entirely. Will circle back to it later.
b) chunk things up for compression
Yes exactly (if this is indeed the problem). Does NGINX have a max chunk size?
Not sure -- read their filter but couldn't grok it entirely. Will circle back to it later.
So nginx calls deflate() with 32k chunks:
127.0.0.1 - - [13/Apr/2020:16:12:12 +0000] "GET /static/blob-156.json HTTP/1.1" 200 85414 "-" "-"
deflate buffer size: 32768
deflate buffer size: 32768
deflate buffer size: 32768
deflate buffer size: 32768
deflate buffer size: 28931
We should probably experiment with something similar. cc: @rojkov
So I updated the benchmark added in #10464 and when using the same input for the different chunk sizes the compression times is pretty much the same everywhere (for the corresponding parameters, of course):
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
....
compressFull/0/manual_time 14.1 ms 14.3 ms 48
compressFull/1/manual_time 7.06 ms 7.22 ms 104
compressFull/2/manual_time 5.17 ms 5.33 ms 123
compressFull/3/manual_time 15.4 ms 15.5 ms 45
compressFull/4/manual_time 10.1 ms 10.3 ms 69
compressFull/5/manual_time 15.8 ms 16.0 ms 40
compressFull/6/manual_time 15.3 ms 15.5 ms 42
compressFull/7/manual_time 9.91 ms 10.1 ms 71
compressFull/8/manual_time 15.8 ms 16.0 ms 45
compressChunks16384/0/manual_time 13.4 ms 13.5 ms 52
compressChunks16384/1/manual_time 6.33 ms 6.48 ms 111
compressChunks16384/2/manual_time 5.09 ms 5.27 ms 147
compressChunks16384/3/manual_time 15.1 ms 15.3 ms 46
compressChunks16384/4/manual_time 9.61 ms 9.78 ms 71
compressChunks16384/5/manual_time 14.5 ms 14.6 ms 47
compressChunks16384/6/manual_time 14.0 ms 14.1 ms 48
compressChunks16384/7/manual_time 9.20 ms 9.36 ms 76
compressChunks16384/8/manual_time 14.5 ms 14.6 ms 48
compressChunks8192/0/manual_time 14.3 ms 14.5 ms 50
compressChunks8192/1/manual_time 6.80 ms 6.96 ms 100
compressChunks8192/2/manual_time 5.21 ms 5.36 ms 135
compressChunks8192/3/manual_time 14.9 ms 15.0 ms 47
compressChunks8192/4/manual_time 9.71 ms 9.87 ms 68
compressChunks8192/5/manual_time 15.9 ms 16.1 ms 45
I think the benchmark does not tweak the chunk size input to ZlibCompressorImpl, but changing that doesn't seem to make a difference.
Some possible things to look at:
I'm trying to wrap my head around the typical input to ZlibCompressorImpl::compress. I think it will be a buffer with slices of around 16KB most of the time when handling HTTP1 bodies with content-length.
I think the benchmark does not tweak the chunk size input to ZlibCompressorImpl, but changing that doesn't seem to make a difference.
Some possible things to look at:
- performance when ZlibCompressorImpl::compress is called on a single buffer with multiple slices in it.
- Consider annotating benchmarks with the compressed size produced, and the compression_level, strategy and mem_level used for the benchmark run.
I'm trying to wrap my head around the typical input to ZlibCompressorImpl::compress. I think it will be a buffer with slices of around 16KB most of the time when handling HTTP1 bodies with content-length.
Yup -- what I am seeing in my experiments is mostly slices of 16kb, given that we have this hard-coded:
https://github.com/envoyproxy/envoy/blob/master/source/common/network/raw_buffer_socket.cc#L21
I played with making 32kb reads... no difference.
I also chatted with @jmarantz about this, and he suggested using cachegrind on the benchmark to see if we are being heavily penalized by cache misses (presumably, nginx has more years of perf optimization, so their code might be more cache friendly). I'll get to cachegrind tomorrow.
Ah! There is some not super hard evidence that we are more memory bound than nginx, when looking at perf data nginx does get more instructions per cycle (> 1.0) than Envoy (< 0.8)... But that could be something else too. Thanks for looking at this @antoniovicente !
It looks that GzipFilter is creating ZlibCompressorImpl using the default constructor with 4KB chunk size. Might be source of the issue:
https://github.com/envoyproxy/envoy/blob/master/source/common/compressor/zlib_compressor_impl.cc#L14
EDIT: Nevermind, it seems that the "worse compression ratio" part was fixed already.
What PR fixed the "worse compression ratio" issue?
What PR fixed the "worse compression ratio" issue?
Finally I've got some time to look into it. Below is the output of Intel profiler

It seems Envoy's code is hardly the problem. All hotspots are in zlib itself.
I tried to replace zlib with zlib-ng with all optimizations switched on and for the majority of cases performance improved about twice as much.
Zlib:
----------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------
compressFull/0/manual_time 12.6 ms 12.8 ms 56
compressFull/1/manual_time 6.17 ms 6.42 ms 112
compressFull/2/manual_time 4.62 ms 4.87 ms 149
compressFull/3/manual_time 13.5 ms 13.7 ms 52
compressFull/4/manual_time 8.75 ms 9.00 ms 80
compressFull/5/manual_time 14.5 ms 14.7 ms 49
compressFull/6/manual_time 13.5 ms 13.7 ms 51
compressFull/7/manual_time 8.81 ms 9.05 ms 78
compressFull/8/manual_time 14.4 ms 14.6 ms 49
compressChunks16384/0/manual_time 11.8 ms 12.0 ms 60
compressChunks16384/1/manual_time 5.86 ms 6.11 ms 121
compressChunks16384/2/manual_time 4.48 ms 4.73 ms 156
compressChunks16384/3/manual_time 12.7 ms 12.9 ms 56
compressChunks16384/4/manual_time 8.29 ms 8.54 ms 82
compressChunks16384/5/manual_time 13.6 ms 13.8 ms 52
compressChunks16384/6/manual_time 12.6 ms 12.8 ms 56
compressChunks16384/7/manual_time 8.37 ms 8.61 ms 84
compressChunks16384/8/manual_time 13.5 ms 13.7 ms 51
compressChunks8192/0/manual_time 12.8 ms 13.0 ms 52
compressChunks8192/1/manual_time 6.21 ms 6.47 ms 113
compressChunks8192/2/manual_time 4.88 ms 5.14 ms 144
compressChunks8192/3/manual_time 13.6 ms 13.9 ms 52
compressChunks8192/4/manual_time 8.87 ms 9.13 ms 77
compressChunks8192/5/manual_time 14.6 ms 14.9 ms 47
compressChunks8192/6/manual_time 13.5 ms 13.8 ms 51
compressChunks8192/7/manual_time 9.07 ms 9.33 ms 77
compressChunks8192/8/manual_time 14.5 ms 14.8 ms 48
compressChunks4096/0/manual_time 12.9 ms 13.2 ms 55
compressChunks4096/1/manual_time 6.23 ms 6.51 ms 114
compressChunks4096/2/manual_time 5.15 ms 5.44 ms 140
compressChunks4096/3/manual_time 13.6 ms 13.9 ms 51
compressChunks4096/4/manual_time 8.97 ms 9.25 ms 78
compressChunks4096/5/manual_time 14.9 ms 15.2 ms 47
compressChunks4096/6/manual_time 13.7 ms 14.0 ms 52
compressChunks4096/7/manual_time 9.11 ms 9.39 ms 78
compressChunks4096/8/manual_time 14.9 ms 15.2 ms 46
compressChunks1024/0/manual_time 13.6 ms 14.0 ms 52
compressChunks1024/1/manual_time 7.34 ms 7.74 ms 97
compressChunks1024/2/manual_time 6.19 ms 6.60 ms 115
compressChunks1024/3/manual_time 14.5 ms 14.9 ms 48
compressChunks1024/4/manual_time 10.4 ms 10.8 ms 70
compressChunks1024/5/manual_time 16.3 ms 16.7 ms 43
compressChunks1024/6/manual_time 14.4 ms 14.8 ms 48
compressChunks1024/7/manual_time 10.3 ms 10.7 ms 68
compressChunks1024/8/manual_time 16.2 ms 16.6 ms 43
Zlib-ng
----------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------
compressFull/0/manual_time 3.47 ms 3.72 ms 203
compressFull/1/manual_time 3.44 ms 3.69 ms 202
compressFull/2/manual_time 3.43 ms 3.68 ms 203
compressFull/3/manual_time 13.2 ms 13.5 ms 53
compressFull/4/manual_time 6.28 ms 6.53 ms 112
compressFull/5/manual_time 5.77 ms 6.02 ms 117
compressFull/6/manual_time 12.4 ms 12.6 ms 57
compressFull/7/manual_time 5.92 ms 6.16 ms 119
compressFull/8/manual_time 6.18 ms 6.43 ms 113
compressChunks16384/0/manual_time 3.28 ms 3.53 ms 214
compressChunks16384/1/manual_time 3.26 ms 3.51 ms 218
compressChunks16384/2/manual_time 3.21 ms 3.46 ms 216
compressChunks16384/3/manual_time 12.5 ms 12.7 ms 56
compressChunks16384/4/manual_time 5.88 ms 6.13 ms 121
compressChunks16384/5/manual_time 5.49 ms 5.74 ms 125
compressChunks16384/6/manual_time 11.7 ms 12.0 ms 61
compressChunks16384/7/manual_time 5.51 ms 5.76 ms 127
compressChunks16384/8/manual_time 5.85 ms 6.11 ms 118
compressChunks8192/0/manual_time 3.48 ms 3.74 ms 202
compressChunks8192/1/manual_time 3.42 ms 3.68 ms 197
compressChunks8192/2/manual_time 3.45 ms 3.72 ms 203
compressChunks8192/3/manual_time 13.3 ms 13.6 ms 51
compressChunks8192/4/manual_time 6.26 ms 6.52 ms 109
compressChunks8192/5/manual_time 5.97 ms 6.23 ms 119
compressChunks8192/6/manual_time 12.5 ms 12.7 ms 56
compressChunks8192/7/manual_time 5.92 ms 6.17 ms 118
compressChunks8192/8/manual_time 6.44 ms 6.70 ms 110
compressChunks4096/0/manual_time 3.55 ms 3.83 ms 199
compressChunks4096/1/manual_time 3.49 ms 3.77 ms 199
compressChunks4096/2/manual_time 3.50 ms 3.78 ms 196
compressChunks4096/3/manual_time 13.9 ms 14.2 ms 50
compressChunks4096/4/manual_time 6.44 ms 6.72 ms 107
compressChunks4096/5/manual_time 6.18 ms 6.45 ms 116
compressChunks4096/6/manual_time 13.0 ms 13.3 ms 54
compressChunks4096/7/manual_time 6.03 ms 6.31 ms 115
compressChunks4096/8/manual_time 6.60 ms 6.88 ms 106
compressChunks1024/0/manual_time 3.82 ms 4.23 ms 186
compressChunks1024/1/manual_time 4.01 ms 4.44 ms 186
compressChunks1024/2/manual_time 3.80 ms 4.21 ms 184
compressChunks1024/3/manual_time 14.0 ms 14.4 ms 50
compressChunks1024/4/manual_time 7.24 ms 7.64 ms 96
compressChunks1024/5/manual_time 7.28 ms 7.69 ms 95
compressChunks1024/6/manual_time 12.9 ms 13.2 ms 54
compressChunks1024/7/manual_time 6.91 ms 7.32 ms 103
compressChunks1024/8/manual_time 7.78 ms 8.18 ms 91
Do we want to migrate to zlib-ng?
@rojkov What do you think of adding a build flag for choosing zlib-ng (similar to https://github.com/envoyproxy/envoy/blob/68f7288413df640415e5a29e4e4480bc860aa49c/source/extensions/filters/common/lua/BUILD#L35)?
That could be a good starting point.
I'm on holidays at the moment, will submit a patch later.
Awesome. Enjoy your holiday!
https://github.com/dio/envoy/pull/new/zlib-ng it builds but not sure how to switch the madler_zlib native.binding for gRPC to use zlib-ng using a build option. (Since we don't want to have mixed zlib in one binary, right?)
@dio nice -- go ahead and create the PR so I can test it locally 👍
Nice! I think we can likely switch to zlib-ng, but maybe initially it should be a compile option as we discussed. Per @dio we may need other patches also to other code that uses zlib.
@rgs1 where are we at with this issue? Is Envoy now comparable to other proxies or do we have more work to do here?
@rgs1 where are we at with this issue? Is Envoy now comparable to other proxies or do we have more work to do here?
Alas, the switch to zlib-ng did not move the needle for us in one of our main workloads (that is, in the only workload where it truly matters -- didn't check if it made things any better in the other ones).
Alas, the switch to zlib-ng did not move the needle for us in one of our main workloads (that is, in the only workload where it truly matters -- didn't check if it made things any better in the other ones).
OK, that's sad. I would love to figure this out. I think @rojkov might possibly have some more cycles to help figure this out. @rojkov is this a perf related investigation you could help drive to closure?
Yes, I'll take a look. Perhaps there's a way to avoid double buffering.
I configured Envoy and Nginx to use similar settings for zlib: compression level 1, default strategy, one thread, memory level 8 and the same sliding window size (window_bits 15). chunk_size is 4096 as Nginx uses buffers of the size of one memory page too.
The test results for downloading a 157K JSON file with
$ k6 run --vus 1 --duration 30s
data_received..............: 175 MB 5.8 MB/s
data_sent..................: 230 kB 7.7 kB/s
http_req_blocked...........: avg=4.55µs min=2.38µs med=4.29µs max=216.02µs p(90)=4.64µs p(95)=4.94µs
http_req_connecting........: avg=81ns min=0s med=0s max=166.75µs p(90)=0s p(95)=0s
http_req_duration..........: avg=14.57ms min=9.83ms med=14.6ms max=20.29ms p(90)=14.92ms p(95)=15.03ms
http_req_receiving.........: avg=6.76ms min=3.31ms med=6.72ms max=14.14ms p(90)=6.91ms p(95)=6.98ms
http_req_sending...........: avg=25.27µs min=12.65µs med=23.24µs max=115.94µs p(90)=31.99µs p(95)=34.24µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=7.77ms min=648.03µs med=7.89ms max=13.94ms p(90)=8.06ms p(95)=8.17ms
http_reqs..................: 2038 67.821657/s
iteration_duration.........: avg=14.7ms min=9.97ms med=14.73ms max=20.51ms p(90)=15.06ms p(95)=15.17ms
iterations.................: 2038 67.821657/s
vus........................: 1 min=1 max=1
vus_max....................: 1 min=1 max=1
data_received..............: 232 MB 7.7 MB/s
data_sent..................: 306 kB 10 kB/s
http_req_blocked...........: avg=8.07µs min=2.69µs med=4.33µs max=464.84µs p(90)=4.87µs p(95)=5.37µs
http_req_connecting........: avg=2.32µs min=0s med=0s max=322.05µs p(90)=0s p(95)=0s
http_req_duration..........: avg=10.92ms min=6.7ms med=10.14ms max=18.29ms p(90)=13.22ms p(95)=13.38ms
http_req_receiving.........: avg=8.19ms min=4.82ms med=7.68ms max=13.41ms p(90)=9.92ms p(95)=10.05ms
http_req_sending...........: avg=25.65µs min=13.07µs med=23.44µs max=126.91µs p(90)=31.53µs p(95)=34.1µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=2.69ms min=1.65ms med=2.49ms max=9.53ms p(90)=3.31ms p(95)=3.39ms
http_reqs..................: 2709 90.18219/s
iteration_duration.........: avg=11.06ms min=6.81ms med=10.28ms max=18.58ms p(90)=13.35ms p(95)=13.51ms
iterations.................: 2709 90.18219/s
vus........................: 1 min=1 max=1
vus_max....................: 1 min=1 max=1
Nginx is about 30% faster overall with TTFB almost 3 times shorter.
I tried to do three things:
memcpy() when moving bytes from zlib's output to the filter's output buffer with https://github.com/rojkov/envoy/commit/d084d1e369e772871d308b0efcb54f8793dff75a;The first thing improved TTFB a bit (by about 1-2%), but TTLB didn't change at all. The impact of the other two approaches was not noticeable at all.
Flamegraphs show that most of the time is spent in zlib: Envoy and Nginx. Though in case of Envoy callstacks are much deeper.
Callgrind doesn't reveal excessive cache thrashing neither, though I might have interpreted the numbers wrong way (sorted by Last level cache misses for Data writes, limited to 0.25%):
1 cache: 32768 B, 64 B, 8-way associative
D1 cache: 32768 B, 64 B, 8-way associative
LL cache: 17825792 B, 64 B, 17-way associative
Timerange: Basic block 0 - 320949664
Trigger: Program termination
Profiled target: bazel-bin/source/exe/envoy-static --concurrency 1 -c rojkov-data/envoy-config-compression-compress.yaml (PID 1972876, part 1)
Events recorded: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Events shown: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Event sort order: DLmw DLmr
Thresholds: 99 0
Include dirs:
User annotated:
Auto-annotation: on
--------------------------------------------------------------------------------
Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
--------------------------------------------------------------------------------
2,589,412,689 (100.0%) 811,817,761 (100.0%) 434,425,819 (100.0%) 1,873,102 (100.0%) 20,526,076 (100.0%) 3,269,251 (100.0%) 52,204 (100.0%) 88,113 (100.0%) 233,862 (100.0%) PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw file:function
--------------------------------------------------------------------------------
16,919,264 ( 0.65%) 59,215 ( 0.01%) 16,152,582 ( 3.72%) 6,525 ( 0.35%) 153 ( 0.00%) 274,275 ( 8.39%) 6 ( 0.01%) 0 102,325 (43.75%) ???:__memset_avx2_unaligned_erms [/usr/lib64/libc-2.30.so]
733,120 ( 0.03%) 24,734 ( 0.00%) 107,338 ( 0.02%) 3,908 ( 0.21%) 2,463 ( 0.01%) 59,212 ( 1.81%) 10 ( 0.02%) 0 48,775 (20.86%) ???:tcmalloc::CentralFreeList::Populate() [/home/rojkov/.cache/bazel/_bazel_rojkov/c0022cd08dcf561b0cf3d59ef5423900/execroot/envoy/bazel-out/k8-opt/bin/source/exe/envoy-static]
2,403,624 ( 0.09%) 557,483 ( 0.07%) 187,115 ( 0.04%) 53 ( 0.00%) 68,098 ( 0.33%) 26,036 ( 0.80%) 53 ( 0.10%) 67,868 (77.02%) 25,986 (11.11%) ???:_dl_relocate_object [/usr/lib64/ld-2.30.so]
78,525,198 ( 3.03%) 76,292,910 ( 9.40%) 76,127,463 (17.52%) 9,260 ( 0.49%) 1,085,206 ( 5.29%) 1,286,418 (39.35%) 12 ( 0.02%) 9,379 (10.64%) 13,523 ( 5.78%) ???:__memcpy_avx_unaligned_erms [/usr/lib64/libc-2.30.so]
81,467 ( 0.00%) 61 ( 0.00%) 46,248 ( 0.01%) 36 ( 0.00%) 12 ( 0.00%) 11,535 ( 0.35%) 10 ( 0.02%) 0 11,526 ( 4.93%) ???:memset [/usr/lib64/libc-2.30.so]
293,583 ( 0.01%) 56,806 ( 0.01%) 107,474 ( 0.02%) 5,004 ( 0.27%) 306 ( 0.00%) 6,025 ( 0.18%) 15 ( 0.03%) 2 ( 0.00%) 4,294 ( 1.84%) /proc/self/cwd/external/com_google_protobuf/src/google/protobuf/descriptor.cc:google::protobuf::DescriptorBuilder::BuildFieldOrExtension(google::protobuf::FieldDescriptorProto const&, google::protobuf::Descriptor const*, google::protobuf::FieldDescriptor*, bool) [/home/rojkov/.cache/bazel/_bazel_rojkov/c0022cd08dcf561b0cf3d59ef5423900/execroot/envoy/bazel-out/k8-opt/bin/source/exe/envoy-static]
16,383 ( 0.00%) 0 16,383 ( 0.00%) 0 0 2,036 ( 0.06%) 0 0 2,010 ( 0.86%) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/tuple:void std::vector<std::unique_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::default_delete<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::unique_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::default_delete<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >::_M_realloc_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*&>(__gnu_cxx::__normal_iterator<std::unique_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::default_delete<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::vector<std::unique_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::default_delete<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::unique_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::default_delete<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*&)
241,350 ( 0.01%) 58,582 ( 0.01%) 114,040 ( 0.03%) 819 ( 0.04%) 5 ( 0.00%) 9,413 ( 0.29%) 2 ( 0.00%) 0 1,347 ( 0.58%) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_vector.h:void std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::_M_realloc_insert<>(__gnu_cxx::__normal_iterator<std::vector<int, std::allocator<int> >*, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >)
10,594 ( 0.00%) 0 10,594 ( 0.00%) 0 0 1,494 ( 0.05%) 0 0 1,337 ( 0.57%) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:google::protobuf::DescriptorPool::Tables::AllocateString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
21,532 ( 0.00%) 6,152 ( 0.00%) 12,304 ( 0.00%) 1 ( 0.00%) 0 1,153 ( 0.04%) 1 ( 0.00%) 0 1,152 ( 0.49%) ???:ares__init_list_head [/home/rojkov/.cache/bazel/_bazel_rojkov/c0022cd08dcf561b0cf3d59ef5423900/execroot/envoy/bazel-out/k8-opt/bin/source/exe/envoy-static]
808,853,382 (31.24%) 318,609,688 (39.25%) 172,031,372 (39.60%) 4,595 ( 0.25%) 9,794,728 (47.72%) 1,130,816 (34.59%) 20 ( 0.04%) 6 ( 0.01%) 1,133 ( 0.48%) ???:deflate_fast [/home/rojkov/.cache/bazel/_bazel_rojkov/c0022cd08dcf561b0cf3d59ef5423900/execroot/envoy/bazel-out/k8-opt/bin/source/exe/envoy-static]
1,023,828 ( 0.04%) 132,908 ( 0.02%) 165,792 ( 0.04%) 2,320 ( 0.12%) 0 7,705 ( 0.24%) 5 ( 0.01%) 0 1,130 ( 0.48%) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_vector.h:re2::Prog::MarkSuccessors(re2::SparseArray<int>*, re2::SparseArray<int>*, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >*, re2::SparseSetT<void>*, std::vector<int, std::allocator<int> >*)
3,600 ( 0.00%) 0 2,400 ( 0.00%) 0 0 866 ( 0.03%) 0 0 838 ( 0.36%) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:re2::RE2::RE2(char const*)
27,140 ( 0.00%) 6,796 ( 0.00%) 10,216 ( 0.00%) 1,638 ( 0.09%) 572 ( 0.00%) 759 ( 0.02%) 4 ( 0.01%) 0 610 ( 0.26%) ???:tcmalloc::NewSpan(unsigned long, unsigned long) [/home/rojkov/.cache/bazel/_bazel_rojkov/c0022cd08dcf561b0cf3d59ef5423900/execroot/envoy/bazel-out/k8-opt/bin/source/exe/envoy-static]
13,864 ( 0.00%) 6,932 ( 0.00%) 5,199 ( 0.00%) 0 0 720 ( 0.02%) 0 0 589 ( 0.25%) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/regex_automaton.h:std::__detail::_State<char>* std::__uninitialized_copy<false>::__uninit_copy<std::__detail::_State<char> const*, std::__detail::_State<char
I1 cache: 32768 B, 64 B, 8-way associative
D1 cache: 32768 B, 64 B, 8-way associative
LL cache: 17825792 B, 64 B, 17-way associative
Timerange: Basic block 0 - 246335785
Trigger: Program termination
Profiled target: ./objs/nginx -p /home/rojkov/work/nginx -c nginx.conf (PID 1167, part 1)
Events recorded: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Events shown: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Event sort order: DLmw DLmr
Thresholds: 99 0
Include dirs:
User annotated:
Auto-annotation: on
--------------------------------------------------------------------------------
Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
--------------------------------------------------------------------------------
1,762,195,422 (100.0%) 493,892,267 (100.0%) 259,362,338 (100.0%) 234,414 (100.0%) 20,945,147 (100.0%) 1,916,199 (100.0%) 4,309 (100.0%) 4,705 (100.0%) 16,219 (100.0%) PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw file:function
--------------------------------------------------------------------------------
804,345,960 (45.64%) 206,282,040 (41.77%) 136,108,800 (52.48%) 5,530 ( 2.36%) 9,517,993 (45.44%) 893,536 (46.63%) 32 ( 0.74%) 10 ( 0.21%) 3,585 (22.10%) ./deflate.c:deflate_fast [/lib/x86_64-linux-gnu/libz.so.1.2.11]
27,862 ( 0.00%) 4,169 ( 0.00%) 7,203 ( 0.00%) 22 ( 0.01%) 18 ( 0.00%) 3,464 ( 0.18%) 21 ( 0.49%) 0 3,455 (21.30%) /home/rojkov/work/nginx/src/event/ngx_event.c:ngx_event_process_init [/home/rojkov/work/nginx/objs/nginx]
15,072,968 ( 0.86%) 12,770,738 ( 2.59%) 12,744,964 ( 4.91%) 1,763 ( 0.75%) 629,332 ( 3.00%) 650,770 (33.96%) 8 ( 0.19%) 518 (11.01%) 2,738 (16.88%) /build/glibc-2ORdQG/glibc-2.27/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/lib/x86_64-linux-gnu/libc-2.27.so]
7,893,689 ( 0.45%) 127 ( 0.00%) 7,892,800 ( 3.04%) 121 ( 0.05%) 120 ( 0.00%) 123,336 ( 6.44%) 1 ( 0.02%) 0 2,492 (15.36%) /build/glibc-2ORdQG/glibc-2.27/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_erms [/lib/x86_64-linux-gnu/libc-2.27.so]
51,195 ( 0.00%) 12,202 ( 0.00%) 6,215 ( 0.00%) 30 ( 0.01%) 1,612 ( 0.01%) 1,153 ( 0.06%) 30 ( 0.70%) 1,546 (32.86%) 1,126 ( 6.94%) /build/glibc-2ORdQG/glibc-2.27/elf/../sysdeps/x86_64/dl-machine.h:_dl_relocate_object
440,540,640 (25.00%) 103,362,480 (20.93%) 54,240,360 (20.91%) 4,694 ( 2.00%) 306,295 ( 1.46%) 159,945 ( 8.35%) 16 ( 0.37%) 4 ( 0.09%) 510 ( 3.14%) ./trees.c:compress_block [/lib/x86_64-linux-gnu/libz.so.1.2.11]
124,780 ( 0.01%) 5,427 ( 0.00%) 37,950 ( 0.01%) 1,107 ( 0.47%) 0 10,525 ( 0.55%) 5 ( 0.12%) 0 462 ( 2.85%) /build/glibc-2ORdQG/glibc-2.27/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms [/lib/x86_64-linux-gnu/libc-2.27.so]
232,352 ( 0.01%) 59,896 ( 0.01%) 14,149 ( 0.01%) 22 ( 0.01%) 65 ( 0.00%) 229 ( 0.01%) 21 ( 0.49%) 0 226 ( 1.39%) /home/rojkov/work/nginx/src/core/ngx_hash.c:ngx_hash_init [/home/rojkov/work/nginx/objs/nginx]
1,481 ( 0.00%) 28 ( 0.00%) 680 ( 0.00%) 7 ( 0.00%) 0 158 ( 0.01%) 4 ( 0.09%) 0 158 ( 0.97%) /build/glibc-2ORdQG/glibc-2.27/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:memset [/lib/x86_64-linux-gnu/ld-2.27.so]
541,741 ( 0.03%) 72,321 ( 0.01%) 88,277 ( 0.03%) 6,981 ( 2.98%) 3,400 ( 0.02%) 3,187 ( 0.17%) 44 ( 1.02%) 1 ( 0.02%) 141 ( 0.87%) /build/glibc-2ORdQG/glibc-2.27/malloc/malloc.c:_int_malloc [/lib/x86_64-linux-gnu/libc-2.27.so]
115,560 ( 0.01%) 14,040 ( 0.00%) 69,240 ( 0.03%) 638 ( 0.27%) 5,631 ( 0.03%) 7,931 ( 0.41%) 5 ( 0.12%) 0 102 ( 0.63%) /usr/include/x86_64-linux-gnu/bits/string3.h:deflate_fast
50,302 ( 0.00%) 4,844 ( 0.00%) 4,418 ( 0.00%) 3 ( 0.00%) 104 ( 0.00%) 96 ( 0.01%) 1 ( 0.02%) 20 ( 0.43%) 95 ( 0.59%) /home/rojkov/work/nginx/src/core/ngx_string.c:ngx_strlow [/home/rojkov/work/nginx/objs/nginx]
48,873 ( 0.00%) 21,014 ( 0.00%) 6,523 ( 0.00%) 13 ( 0.01%) 3 ( 0.00%) 81 ( 0.00%) 13 ( 0.30%) 0 81 ( 0.50%) /home/rojkov/work/nginx/src/http/ngx_http_variables.c:ngx_http_add_variable [/home/rojkov/work/nginx/objs/nginx]
36,028 ( 0.00%) 7,698 ( 0.00%) 3,162 ( 0.00%) 8 ( 0.00%) 24 ( 0.00%) 79 ( 0.00%) 8 ( 0.19%) 3 ( 0.06%) 77 ( 0.47%) /home/rojkov/work/nginx/src/core/ngx_hash.c:ngx_hash_add_key [/home/rojkov/work/nginx/objs/nginx]
71,462 ( 0.00%) 14,731 ( 0.00%) 13,620 ( 0.01%) 7 ( 0.00%) 0 73 ( 0.00%) 7 ( 0.16%) 0 73 ( 0.45%) /home/rojkov/work/nginx/src/http/ngx_http_core_module.c:ngx_http_core_type [/home/rojkov/work/nginx/objs/nginx]
288,195 ( 0.02%) 93,095 ( 0.02%) 37,549 ( 0.01%) 16 ( 0.01%) 1,135 ( 0.01%) 89 ( 0.00%) 7 ( 0.16%) 91 ( 1.93%) 68 ( 0.42%) /home/rojkov/work/nginx/src/core/ngx_conf_file.c:ngx_conf_parse'2 [/home/rojkov/work/nginx/objs/nginx]
1,529 ( 0.00%) 297 ( 0.00%) 298 ( 0.00%) 15 ( 0.01%) 8 ( 0.00%) 55 ( 0.00%) 15 ( 0.35%) 8 ( 0.17%) 55 ( 0.34%) /build/glibc-2ORdQG/glibc-2.27/elf/dl-object.c:_dl_new_object [/lib/x86_64-linux-gnu/ld-2.27.so]
9,369 ( 0.00%) 3,263 ( 0.00%) 866 ( 0.00%) 20 ( 0.01%) 67 ( 0.00%) 48 ( 0.00%) 20 ( 0.46%) 57 ( 1.21%) 44 ( 0.27%) /build/glibc-2ORdQG/glibc-2.27/elf/dl-version.c:_dl_check_map_versions [/lib/x86_64-linux-gnu/ld-2.27.so]
2,206 ( 0.00%) 314 ( 0.00%) 245 ( 0.00%) 10 ( 0.00%) 56 ( 0.00%) 44 ( 0.00%) 10 ( 0.23%) 56 ( 1.19%) 44 ( 0.27%) /build/glibc-2ORdQG/glibc-2.27/elf/get-dynamic-info.h:_dl_map_object_from_fd
Looks like Nginx moves data more effectively though.
Nginx uses adaptive settings for zlib if Content-Length is known: it tries to narrow the sliding window for small data blobs. We could do the same. It should improve memory consumption in some cases, but won't improve request duration.
Now I run out of ideas. Any help is appreciated :)
So, I did one more try to solve this mystery. Turned out I was measuring wrong numbers. Flamegraphs and profilers show where CPU cycles are spent, not where wall time is spent. I used PerfAnnotationContext to measure exactly how long it takes to deflate() 157k of JSON in a running Envoy instance. On my machine the average is 6.6ms (out of >14ms connection life span). The same measurement for nginx shows 6.2ms (out of ~10ms connection life span) - still better, but not dramatically. When the same JSON is compressed in compressor_filter_speed_test it takes less than 3ms (avg). The explanation is that when running compressor_filter_speed_test the CPU "turboboosts" to 3.8GHz. In case of deflate()'ing in a running Envoy instance the CPU core works at 1.6GHz in average giving a clue that the load is not CPU bound. As a result in case of Envoy writing data to a socket is as long as calling deflate(): after having written ~85k of compressed data to the socket in one chunk the next "write ready" event comes in ~5ms. Then the connection is finally closed. With compression switched off and writing 157k in one chunk the "write ready" event comes in <1ms. Looks like these "ready" events are scheduled with a delay if writes are done not often enough. To make them more frequent I set per_connection_buffer_limit_bytes to 8192 for clusters and listeners. Here is the result:
data_received..............: 295 MB 9.8 MB/s
data_sent..................: 454 kB 15 kB/s
http_req_blocked...........: avg=309.09µs min=149.04µs med=310.8µs max=989.34µs p(90)=376.11µs p(95)=389.95µs
http_req_connecting........: avg=209.28µs min=92.84µs med=208.41µs max=801.8µs p(90)=263.33µs p(95)=275.31µs
http_req_duration..........: avg=8.3ms min=4.84ms med=8.19ms max=14.56ms p(90)=9.17ms p(95)=9.31ms
http_req_receiving.........: avg=7.02ms min=4ms med=6.97ms max=11.35ms p(90)=7.61ms p(95)=7.73ms
http_req_sending...........: avg=93.04µs min=41.76µs med=94.11µs max=624.63µs p(90)=111.93µs p(95)=117.71µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=1.18ms min=424.74µs med=1.17ms max=7.49ms p(90)=1.5ms p(95)=1.57ms
http_reqs..................: 3439 114.499003/s
iteration_duration.........: avg=8.71ms min=5.11ms med=8.61ms max=15.06ms p(90)=9.56ms p(95)=9.73ms
iterations.................: 3439 114.499003/s
vus........................: 1 min=1 max=1
vus_max....................: 1 min=1 max=1
Now envoy is ~25% faster than nginx.
@rgs1 What were the zlib settings in your setup? Could you check if you set them equal to nginx's?
In my setup they are
memory_level: 8
compression_level: compression_level_1
window_bits: 15
chunk_size: 4096
compression_strategy: default_strategy
Nginx doesn't document how to specify memlevel and wbits (because they are calculated dynamically for known Content-Length), but in case of 157k they are 8 and 15 respectively.
Wow. So awesome!
Great work!
The only change you made was to set per_connection_buffer_limit_bytes to 8192? What was it prevously?
The only change you made was to set per_connection_buffer_limit_bytes to 8192? What was it prevously?
Yes, only this change. Initially there was nothing in my config and the default is 1M.
@rojkov hmm, interesting. Could it a problem that we have different limits for listeners and clusters:
clusters.json: "per_connection_buffer_limit_bytes": 16384,
listeners.json: "per_connection_buffer_limit_bytes": 8192
?
This is all using zlib-ng right?
Our settings:
"name": "envoy.filters.http.gzip",
"typed_config": {
"@type": "type.googleapis.com/envoy.config.filter.http.gzip.v2.Gzip",
"compression_level": "SPEED",
"compressor": {
"content_length": 1100,
"runtime_enabled": {
"default_value": true,
"runtime_key": "gzip.filter_enabled"
}
},
"memory_level": 9,
"window_bits": 15
}
So mostly the same, I don't expect that memory_level:9 vs memory_level:8 would make such a big change...
This is all using zlib-ng right?
No, I compiled Envoy with the same old zlib used by Nginx.
I don't think different limits matter so much. Neither memlevel 9. Then I don't understand why your results are so different :(
@rojkov let me dig a bit more on our side, regardless of our case this is a great finding -- thanks! Shall we document this somewhere?
Yeah, adding some notes to compressor_filter.rst won't harm. I'll think how to formulate it better.
By the way, at some point I experimented with CPU frequency a bit: while benchmarking compression with k6 I artificially loaded all the cores to 100% (with a full envoy rebuild). This made Envoy serve twice as fast and Nginx showed a similar dynamics. Which can be counterintuitive yet explainable. Just a side note on how important it is to do experiments in the same environmental conditions.
Thanks @rojkov this is amazing work and a really great finding around how buffer sizes effect the overall throughput.
Looks like these "ready" events are scheduled with a delay if writes are done not often enough. To make them more frequent I set per_connection_buffer_limit_bytes to 8192 for clusters and listeners.
After having written ~85k of compressed data to the socket in one chunk the next "write ready" event comes in ~5ms. Then the connection is finally closed. With compression switched off and writing 157k in one chunk the "write ready" event comes in <1ms. Looks like these "ready" events are scheduled with a delay if writes are done not often enough.
I'm a little confused and have a few questions that we can follow up on:
1) In the compressed case, are we basically doing a single writev with a single slice which is giant?
2) In the uncompressed case, are we doing a single writev with multiple slices? Or is the data spread over more writes because it is streamed?
3) If we know all data has been written, what does connection closure have to do with the overall latency? I would assume TTLB on the downstream client would be the same regardless of closure as long as content-length is set? Or is the issue we are chunk encoding and the last chunk is not written until 5ms later?
4) Is there a way we can split up the writes in the compressed case such that max buffer bytes doesn't come into play? I'm a little confused about how decreasing the max connection buffer bytes is helping. Is it cause back pressure so we are reading in smaller chunks which are then compressed in smaller blocks?
cc @antoniovicente for event and buffer expertise. cc @ggreenway as this very likely relates to the perf work you have been doing.
I used PerfAnnotationContext to measure
As an aside, this is really neat. We should document this approach and/or would it be worthwhile to add this hooks into the code base under some macro guard so that we can more easily use this annotations for perf investigations?
In the compressed case, are we basically doing a single writev with a single slice which is giant?
In the uncompressed case, are we doing a single writev with multiple slices? Or is the data spread over more writes because it is streamed?
I understood that there is one single write with a single slice in both cases. This is how it looks in logs:
[2020-10-29 17:56:19.346][1539332][trace][http] [source/common/http/conn_manager_impl.cc:1486] [C2][S34169261113250019] encoding data via codec (size=85429 end_stream=false)
[2020-10-29 17:56:19.346][1539332][trace][connection] [source/common/network/connection_impl.cc:428] [C2] writing 85438 bytes, end_stream false
[2020-10-29 17:56:19.346][1539332][trace][http] [source/common/http/http1/codec_impl.cc:1262] [C1] message complete
[2020-10-29 17:56:19.346][1539332][debug][client] [source/common/http/codec_client.cc:109] [C1] response complete
[2020-10-29 17:56:19.346][1539332][trace][http] [source/common/http/conn_manager_impl.cc:1486] [C2][S34169261113250019] encoding data via codec (size=10 end_stream=true)
[2020-10-29 17:56:19.346][1539332][trace][connection] [source/common/network/connection_impl.cc:316] [C2] readDisable: disable=false disable_count=1 state=0 buffer_length=0
[2020-10-29 17:56:19.347][1539332][trace][http] [source/common/http/http1/codec_impl.cc:599] [C1] parsed 160253 bytes
[2020-10-29 17:56:19.347][1539332][trace][connection] [source/common/network/connection_impl.cc:510] [C1] socket event: 2, ts: 19705
[2020-10-29 17:56:19.347][1539332][trace][connection] [source/common/network/connection_impl.cc:611] [C1] write ready
[2020-10-29 17:56:19.347][1539332][trace][connection] [source/common/network/connection_impl.cc:510] [C2] socket event: 2, ts: 19750
[2020-10-29 17:56:19.347][1539332][trace][connection] [source/common/network/connection_impl.cc:611] [C2] write ready
[2020-10-29 17:56:19.347][1539332][trace][connection] [source/common/network/raw_buffer_socket.cc:77] [C2] write returns: 85738 ts: 19887us <- single write
[2020-10-29 17:56:19.351][1539332][trace][connection] [source/common/network/connection_impl.cc:510] [C2] socket event: 3, ts: 24264us
[2020-10-29 17:56:19.351][1539332][trace][connection] [source/common/network/connection_impl.cc:611] [C2] write ready
[2020-10-29 17:58:31.455][1539478][trace][http] [source/common/http/conn_manager_impl.cc:1486] [C2][S13951305510064282760] encoding data via codec (size=160003 end_stream=false)
[2020-10-29 17:58:31.455][1539478][trace][connection] [source/common/network/connection_impl.cc:428] [C2] writing 160003 bytes, end_stream false
[2020-10-29 17:58:31.455][1539478][trace][http] [source/common/http/http1/codec_impl.cc:1262] [C1] message complete
[2020-10-29 17:58:31.455][1539478][debug][client] [source/common/http/codec_client.cc:109] [C1] response complete
[2020-10-29 17:58:31.455][1539478][trace][http] [source/common/http/conn_manager_impl.cc:1486] [C2][S13951305510064282760] encoding data via codec (size=0 end_stream=true)
[2020-10-29 17:58:31.455][1539478][trace][connection] [source/common/network/connection_impl.cc:316] [C2] readDisable: disable=false disable_count=1 state=0 buffer_length=0
[2020-10-29 17:58:31.455][1539478][trace][http] [source/common/http/http1/codec_impl.cc:599] [C1] parsed 160253 bytes
[2020-10-29 17:58:31.456][1539478][trace][connection] [source/common/network/connection_impl.cc:510] [C1] socket event: 2, ts: 5064
[2020-10-29 17:58:31.456][1539478][trace][connection] [source/common/network/connection_impl.cc:611] [C1] write ready
[2020-10-29 17:58:31.456][1539478][trace][connection] [source/common/network/connection_impl.cc:510] [C2] socket event: 2, ts: 5125
[2020-10-29 17:58:31.456][1539478][trace][connection] [source/common/network/connection_impl.cc:611] [C2] write ready
[2020-10-29 17:58:31.456][1539478][trace][connection] [source/common/network/raw_buffer_socket.cc:77] [C2] write returns: 160256 ts: 5322us <- single write
[2020-10-29 17:58:31.457][1539478][trace][connection] [source/common/network/connection_impl.cc:510] [C2] socket event: 3, ts: 6376us
[2020-10-29 17:58:31.457][1539478][trace][connection] [source/common/network/connection_impl.cc:611] [C2] write ready
ts is a timestamp in microseconds since a ActiveTcpConnection creation.
C2 is a downstream connection.
If we know all data has been written, what does connection closure have to do with the overall latency? I would assume TTLB on the downstream client would be the same regardless of closure as long as content-length is set? Or is the issue we are chunk encoding and the last chunk is not written until 5ms later?
A-ha. I should have asked this question myself... So, I did closed loop testing, that is a new connection to the listener is not created until the current one isn't closed. TTLB should be shorter for a real client. I guess I need to use something other than k6 for benchmarking.
I've just tried to benchmark with 10 virtual simultaneous users connecting to a single-threaded Envoy and got
280 reqs/s (default 1M buffer) vs 371 reqs/s (8k per connection buffer).
With 20 virtual users I got 306 reqs/s vs 364 reqs/s respectively.
With 80 VUs I got 420 reqs/s vs 311 reqs/s.
Whereas Nginx consistently gives ~420 reqs/s starting from 10 VUs. Then increasing VUs doesn't increase throughput.
I actually do wonder if some of the fixes @ggreenway is working on may impact what we are seeing. Something still doesn't add up to me here.
Most helpful comment
Awesome. Enjoy your holiday!
https://github.com/dio/envoy/pull/new/zlib-ng it builds but not sure how to switch the
madler_zlibnative.binding for gRPC to use zlib-ng using a build option. (Since we don't want to have mixed zlib in one binary, right?)