Ryan pointed out the BREACH vulnerability with SSL and compression. I think it still should be on default in ES, but off by default if SSL is enabled
@c-a-m ok, so we should leave this open then, no?
I just realized that the BREACH vulnerability is with a compression feature of TLS, it has nothing to do with HTTP level compression. This is totally fine to be left on by default.
Apparently HTTP compression was disabled originally because the LZF(?) library used by Netty had memory leaks. Need to check if this is still the case.
I did some stress tests with es 2.3 and was not able to reproduce the leak. It seems that the http compression was disabled by default because "many clients are buggy when it comes to supporting it." https://github.com/elastic/elasticsearch/issues/1482
I've tested sending compressed data and receiving compressed data with elasticsearch-py on my local machine. Compression did not help for the performance and can also degrade some types of queries (scroll queries are 20% slower when the compression is enabled). Tough my test is not realistic at all, I am using a mac book air and all my queries are local to my machine. I guess that the compression could help if the network is congested.
Bottom line is that we can re-enable http compression by default but it will not change anything if the users do not send the appropriate header which activates the compression of the response (Accept-Encoding: gzip) in their requests.
@jimferenczi thanks for testing this. Of course, testing on your local machine avoids network latency so you see the downside of compression without it really having the opportunity to shine.
Bottom line is that we can re-enable http compression by default but it will not change anything if the users do not send the appropriate header which activates the compression of the response (Accept-Encoding: gzip) in their requests.
Agreed. I think most (if not all) of the official clients have compression support, as long as the user enables it. If we decide to enable it by default, then the client authors can make the appropriate changes.
@jpountz what do you think of enabling it by default?
The size of responses seems to be a pretty common source of complaints so I think we should try to enable it by default. I suspect that our responses have a lot of duplicated strings so even low levels of compression would already reduce the size of the data significantly. So we could try to enable it by default eg. with a compression level of 3 (currently the default level is 6, 3 is the highest compression level of DEFLATE that does not use lazy match evaluation, which tends to make compression slow). This way we would limit the potential bad performance impacts of having compression on by default?
I've tested the full compression scheme (send and receive compressed content) with a compression level of 6. I'll check with the response compression only and a compression level of 3 if the impact is visible in terms of performance.
I have benchmarked this scenario with Rally against a single node cluster with default settings except for the heap size (which I set to -Xms4G -Xmx4G) of a recent master build of Elasticsearch (revision 6921712). I used a dedicated bare metal machine for Rally and a dedicated one for the benchmark candidate. I used compression level 9 to amplify the effect of compression as much as possible and compressed all requests and responses. The data set was the same geonames benchmark that we also use in the nightly benchmarks. Preliminary results show:
Details are in the attached graphics from the Kibana dashboard which are _currently_ also available at https://b7dea5252a72b78502fc91e0462fca7e.us-east-1.aws.found.io/app/kibana#/dashboard/HTTP-Compression-Benchmark-Results (I may remove them at any time; that's why I uploaded the screenshot for reference):

I'll run a few more benchmarks but so far I can confirm Jim's testing.
Thanks @danielmitterdorfer.
Indexing throughput and CPU utilization during indexing is roughly equivalent
This is really a big win, most of the traffic is generated during indexing IMO we should really accept compressed request by default.
Query latency suffers drastically, especially in the higher percentiles (90% percentile and above). Worst are the scroll query and the term query.
This is the tricky part. In my tests the request and the response (and I guess it's the same here) are compressed. IMO we should never compress a body smaller than 1k.
What do you think of adding a minimum body size to enable compression on both end (server and client) ?
Is the issue really with small bodies? If the body is small, then likely it will be very fast to compress as well? I was more under the assumption that scrolls and term queries have a performance hit because they are among the cheapest queries that you can send to elasticsearch? I would be curious to see how different the results are with a compression level of 1.
@danielmitterdorfer you say:
we should really accept compressed request by default.
If you were using the python client, I'm pretty sure the request was not compressed, only the response.
@clintongormley I think he was using a custom connection that enable the compression on the client side. In fact I am pretty sure he did because the received bytes on es side is way lower when compression is "on".
@danielmitterdorfer you say:
we should really accept compressed request by default.
@clintongormley This was @jimferenczi. I did not draw any conclusions yet. ;) I'll gather more data points (different compression rates) and also investigate a few issues. Btw, Jim was right: I used a custom connection in the Python client that gzips the request
I just checked the impact of Python 3.5 stdlib gzip compression for bulk requests (with a bulk size of 5000) and a sample query (result of 100 trials in a microbenchmark):
| Comment | Size [bytes] | Min compression time (ms) | Mean compression time (ms) | Max compression time (ms) |
| --- | --- | --- | --- | --- |
| Bulk request with 5000 items | 1829194 | 103.12 | 105.10 | 114.09 |
| Aggregation Query | 330 | 0.023 | 0.023 | 0.066 |
So the overhead on client side is negligible.
I ran a couple of further experiments. Again, preliminary results but with larger scroll sizes, org.jboss.netty.util.internal.jzlib.ZStream.deflate(int) completely dominates the profile, i.e. it's spending more than half of its time compressing the result. With a compression level of 1, ZStream uses a different compression approach which does not show up that high in the profile btw.
The relevant source in the Netty code base indicates that the same compression approach is used for compression levels from 1 to 3 (see https://github.com/netty/netty/blob/netty-3.10.5.Final/src/main/java/org/jboss/netty/util/internal/jzlib/Deflate.java#L79-L81) so I also benchmarked with a compression level of 3. In the benchmarked scenario (geonames) indicates that we can save a negligible amount of network traffic compared to level 1. Query latency also increases a little bit.
I will run the benchmark results against another data set to add one data point more but I'd suggest that in the interest of query latency we reduce the default compression level either to 1 or 3 if we enable HTTP compression by default.
Interactive results are available at https://elasticsearch-benchmarks.elastic.co/app/kibana#/dashboard/HTTP-Compression-Benchmark-Results
Below is a full-page screenshot of the same page:

We should enable request decompression regardless of whether response decompression is enabled, ie in https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/http/netty/NettyHttpServerTransport.java#L547 change ESHttpContentDecompressor to HttpContentDecompressor.
Also, the comment about BREACH https://github.com/elastic/elasticsearch/issues/7309#issuecomment-56870625 appears to be incorrect (see http://breachattack.com/) so we should default compression to disabled if SSL is enabled.
I also ran a microbenchmark of Netty's ZlibEncoder and JdkZlibEncoder (refered to as jzlib and jdk below) with a smaller JSON document (a few hundred bytes) and a larger JSON document (3.6MB) at different compression levels to see whether we should change the encoder implementation for performance reasons but the benchmark results indicate we should not change it (especially at smaller compression levels):
Benchmark (compressionLevel) (impl) (smallDocument) Mode Cnt Score Error Units
NettyZlibBenchmark.encode 1 jzlib false thrpt 150 75.960 ± 0.051 ops/s
NettyZlibBenchmark.encode 1 jzlib true thrpt 150 195383.389 ± 690.821 ops/s
NettyZlibBenchmark.encode 1 jdk false thrpt 150 68.254 ± 0.154 ops/s
NettyZlibBenchmark.encode 1 jdk true thrpt 150 159102.287 ± 227.628 ops/s
NettyZlibBenchmark.encode 3 jzlib false thrpt 150 74.859 ± 0.057 ops/s
NettyZlibBenchmark.encode 3 jzlib true thrpt 150 187901.799 ± 612.592 ops/s
NettyZlibBenchmark.encode 3 jdk false thrpt 150 67.480 ± 0.042 ops/s
NettyZlibBenchmark.encode 3 jdk true thrpt 150 159002.153 ± 101.567 ops/s
NettyZlibBenchmark.encode 6 jzlib false thrpt 150 38.250 ± 0.023 ops/s
NettyZlibBenchmark.encode 6 jzlib true thrpt 150 84190.875 ± 303.414 ops/s
NettyZlibBenchmark.encode 6 jdk false thrpt 150 35.101 ± 0.179 ops/s
NettyZlibBenchmark.encode 6 jdk true thrpt 150 86632.628 ± 77.181 ops/s
NettyZlibBenchmark.encode 9 jzlib false thrpt 150 11.812 ± 0.017 ops/s
NettyZlibBenchmark.encode 9 jzlib true thrpt 150 54201.944 ± 89.032 ops/s
NettyZlibBenchmark.encode 9 jdk false thrpt 150 11.894 ± 0.021 ops/s
NettyZlibBenchmark.encode 9 jdk true thrpt 150 60536.066 ± 101.270 ops/s
The benchmark was run on a silent server class machine (Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, Linux Kernel 4.2.0-34). It was pinned to core 0 with taskset -c 0 java -jar netty-zlib-0.1.0-all.jar -f 5 -wi 30 -i 30. I verified (in a separate trial run) with JMH's perf profiler that we had no CPU migrations. All cores ran with the performance CPU governor at 3.4GHz.
We have lost really a lot of time before fixing Painless really bad performance issues by enabling TCP compression ... we were consuming all our VMware bandwidth with complex aggregations on a 5 nodes cluster.
We have applied all performances guidelines, but no hint about compression.
https://www.elastic.co/guide/en/elasticsearch/reference/current/system-config.html

Most helpful comment
I have benchmarked this scenario with Rally against a single node cluster with default settings except for the heap size (which I set to
-Xms4G -Xmx4G) of a recent master build of Elasticsearch (revision 6921712). I used a dedicated bare metal machine for Rally and a dedicated one for the benchmark candidate. I used compression level 9 to amplify the effect of compression as much as possible and compressed all requests and responses. The data set was the same geonames benchmark that we also use in the nightly benchmarks. Preliminary results show:Details are in the attached graphics from the Kibana dashboard which are _currently_ also available at https://b7dea5252a72b78502fc91e0462fca7e.us-east-1.aws.found.io/app/kibana#/dashboard/HTTP-Compression-Benchmark-Results (I may remove them at any time; that's why I uploaded the screenshot for reference):
I'll run a few more benchmarks but so far I can confirm Jim's testing.