Elasticsearch: Enable HTTP compression by default

Created on 18 Aug 2014 · 20Comments · Source: elastic/elasticsearch

See https://github.com/elasticsearch/elasticsearch/pull/7241#issuecomment-52343350

:CorInfrREST API >enhancement v5.0.0-alpha3

Source

clintongormley

Most helpful comment

I have benchmarked this scenario with Rally against a single node cluster with default settings except for the heap size (which I set to -Xms4G -Xmx4G) of a recent master build of Elasticsearch (revision 6921712). I used a dedicated bare metal machine for Rally and a dedicated one for the benchmark candidate. I used compression level 9 to amplify the effect of compression as much as possible and compressed all requests and responses. The data set was the same geonames benchmark that we also use in the nightly benchmarks. Preliminary results show:

Network traffic is significantly lower (around 25 vs. around 3GB sent, around 3.4GB vs. 350MB received)
Indexing throughput and CPU utilization during indexing is roughly equivalent
Query latency suffers drastically, especially in the higher percentiles (90% percentile and above). Worst are the scroll query and the term query.

Details are in the attached graphics from the Kibana dashboard which are _currently_ also available at https://b7dea5252a72b78502fc91e0462fca7e.us-east-1.aws.found.io/app/kibana#/dashboard/HTTP-Compression-Benchmark-Results (I may remove them at any time; that's why I uploaded the screenshot for reference):

I'll run a few more benchmarks but so far I can confirm Jim's testing.

danielmitterdorfer on 21 Apr 2016

👍4

All 20 comments

Ryan pointed out the BREACH vulnerability with SSL and compression. I think it still should be on default in ES, but off by default if SSL is enabled

c-a-m on 22 Aug 2014

@c-a-m ok, so we should leave this open then, no?

clintongormley on 22 Aug 2014

I just realized that the BREACH vulnerability is with a compression feature of TLS, it has nothing to do with HTTP level compression. This is totally fine to be left on by default.

c-a-m on 25 Sep 2014

Apparently HTTP compression was disabled originally because the LZF(?) library used by Netty had memory leaks. Need to check if this is still the case.

clintongormley on 1 Mar 2016

I did some stress tests with es 2.3 and was not able to reproduce the leak. It seems that the http compression was disabled by default because "many clients are buggy when it comes to supporting it." https://github.com/elastic/elasticsearch/issues/1482
I've tested sending compressed data and receiving compressed data with elasticsearch-py on my local machine. Compression did not help for the performance and can also degrade some types of queries (scroll queries are 20% slower when the compression is enabled). Tough my test is not realistic at all, I am using a mac book air and all my queries are local to my machine. I guess that the compression could help if the network is congested.
Bottom line is that we can re-enable http compression by default but it will not change anything if the users do not send the appropriate header which activates the compression of the response (Accept-Encoding: gzip) in their requests.

jimczi on 29 Mar 2016

@jimferenczi thanks for testing this. Of course, testing on your local machine avoids network latency so you see the downside of compression without it really having the opportunity to shine.

Bottom line is that we can re-enable http compression by default but it will not change anything if the users do not send the appropriate header which activates the compression of the response (Accept-Encoding: gzip) in their requests.

Agreed. I think most (if not all) of the official clients have compression support, as long as the user enables it. If we decide to enable it by default, then the client authors can make the appropriate changes.

@jpountz what do you think of enabling it by default?

clintongormley on 30 Mar 2016

The size of responses seems to be a pretty common source of complaints so I think we should try to enable it by default. I suspect that our responses have a lot of duplicated strings so even low levels of compression would already reduce the size of the data significantly. So we could try to enable it by default eg. with a compression level of 3 (currently the default level is 6, 3 is the highest compression level of DEFLATE that does not use lazy match evaluation, which tends to make compression slow). This way we would limit the potential bad performance impacts of having compression on by default?

jpountz on 31 Mar 2016

I've tested the full compression scheme (send and receive compressed content) with a compression level of 6. I'll check with the response compression only and a compression level of 3 if the impact is visible in terms of performance.

jimczi on 1 Apr 2016

👍1

Network traffic is significantly lower (around 25 vs. around 3GB sent, around 3.4GB vs. 350MB received)
Indexing throughput and CPU utilization during indexing is roughly equivalent
Query latency suffers drastically, especially in the higher percentiles (90% percentile and above). Worst are the scroll query and the term query.

I'll run a few more benchmarks but so far I can confirm Jim's testing.

danielmitterdorfer on 21 Apr 2016

👍4

Thanks @danielmitterdorfer.

Indexing throughput and CPU utilization during indexing is roughly equivalent

This is really a big win, most of the traffic is generated during indexing IMO we should really accept compressed request by default.

Query latency suffers drastically, especially in the higher percentiles (90% percentile and above). Worst are the scroll query and the term query.

This is the tricky part. In my tests the request and the response (and I guess it's the same here) are compressed. IMO we should never compress a body smaller than 1k.
What do you think of adding a minimum body size to enable compression on both end (server and client) ?

jimczi on 21 Apr 2016

Is the issue really with small bodies? If the body is small, then likely it will be very fast to compress as well? I was more under the assumption that scrolls and term queries have a performance hit because they are among the cheapest queries that you can send to elasticsearch? I would be curious to see how different the results are with a compression level of 1.

jpountz on 21 Apr 2016

@danielmitterdorfer you say:

we should really accept compressed request by default.

If you were using the python client, I'm pretty sure the request was not compressed, only the response.

clintongormley on 21 Apr 2016

@clintongormley I think he was using a custom connection that enable the compression on the client side. In fact I am pretty sure he did because the received bytes on es side is way lower when compression is "on".

jimczi on 21 Apr 2016

@danielmitterdorfer you say:

we should really accept compressed request by default.

@clintongormley This was @jimferenczi. I did not draw any conclusions yet. ;) I'll gather more data points (different compression rates) and also investigate a few issues. Btw, Jim was right: I used a custom connection in the Python client that gzips the request

danielmitterdorfer on 21 Apr 2016

I just checked the impact of Python 3.5 stdlib gzip compression for bulk requests (with a bulk size of 5000) and a sample query (result of 100 trials in a microbenchmark):

| Comment | Size [bytes] | Min compression time (ms) | Mean compression time (ms) | Max compression time (ms) |
| --- | --- | --- | --- | --- |
| Bulk request with 5000 items | 1829194 | 103.12 | 105.10 | 114.09 |
| Aggregation Query | 330 | 0.023 | 0.023 | 0.066 |

So the overhead on client side is negligible.

danielmitterdorfer on 21 Apr 2016

I ran a couple of further experiments. Again, preliminary results but with larger scroll sizes, org.jboss.netty.util.internal.jzlib.ZStream.deflate(int) completely dominates the profile, i.e. it's spending more than half of its time compressing the result. With a compression level of 1, ZStream uses a different compression approach which does not show up that high in the profile btw.

danielmitterdorfer on 22 Apr 2016

The relevant source in the Netty code base indicates that the same compression approach is used for compression levels from 1 to 3 (see https://github.com/netty/netty/blob/netty-3.10.5.Final/src/main/java/org/jboss/netty/util/internal/jzlib/Deflate.java#L79-L81) so I also benchmarked with a compression level of 3. In the benchmarked scenario (geonames) indicates that we can save a negligible amount of network traffic compared to level 1. Query latency also increases a little bit.

I will run the benchmark results against another data set to add one data point more but I'd suggest that in the interest of query latency we reduce the default compression level either to 1 or 3 if we enable HTTP compression by default.

Interactive results are available at https://elasticsearch-benchmarks.elastic.co/app/kibana#/dashboard/HTTP-Compression-Benchmark-Results

Below is a full-page screenshot of the same page:

danielmitterdorfer on 26 Apr 2016

We should enable request decompression regardless of whether response decompression is enabled, ie in https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/http/netty/NettyHttpServerTransport.java#L547 change ESHttpContentDecompressor to HttpContentDecompressor.

Also, the comment about BREACH https://github.com/elastic/elasticsearch/issues/7309#issuecomment-56870625 appears to be incorrect (see http://breachattack.com/) so we should default compression to disabled if SSL is enabled.

clintongormley on 26 Apr 2016

I also ran a microbenchmark of Netty's ZlibEncoder and JdkZlibEncoder (refered to as jzlib and jdk below) with a smaller JSON document (a few hundred bytes) and a larger JSON document (3.6MB) at different compression levels to see whether we should change the encoder implementation for performance reasons but the benchmark results indicate we should not change it (especially at smaller compression levels):

Benchmark                  (compressionLevel)  (impl)  (smallDocument)   Mode  Cnt       Score     Error  Units
NettyZlibBenchmark.encode                   1   jzlib            false  thrpt  150      75.960 ±   0.051  ops/s
NettyZlibBenchmark.encode                   1   jzlib             true  thrpt  150  195383.389 ± 690.821  ops/s
NettyZlibBenchmark.encode                   1     jdk            false  thrpt  150      68.254 ±   0.154  ops/s
NettyZlibBenchmark.encode                   1     jdk             true  thrpt  150  159102.287 ± 227.628  ops/s
NettyZlibBenchmark.encode                   3   jzlib            false  thrpt  150      74.859 ±   0.057  ops/s
NettyZlibBenchmark.encode                   3   jzlib             true  thrpt  150  187901.799 ± 612.592  ops/s
NettyZlibBenchmark.encode                   3     jdk            false  thrpt  150      67.480 ±   0.042  ops/s
NettyZlibBenchmark.encode                   3     jdk             true  thrpt  150  159002.153 ± 101.567  ops/s
NettyZlibBenchmark.encode                   6   jzlib            false  thrpt  150      38.250 ±   0.023  ops/s
NettyZlibBenchmark.encode                   6   jzlib             true  thrpt  150   84190.875 ± 303.414  ops/s
NettyZlibBenchmark.encode                   6     jdk            false  thrpt  150      35.101 ±   0.179  ops/s
NettyZlibBenchmark.encode                   6     jdk             true  thrpt  150   86632.628 ±  77.181  ops/s
NettyZlibBenchmark.encode                   9   jzlib            false  thrpt  150      11.812 ±   0.017  ops/s
NettyZlibBenchmark.encode                   9   jzlib             true  thrpt  150   54201.944 ±  89.032  ops/s
NettyZlibBenchmark.encode                   9     jdk            false  thrpt  150      11.894 ±   0.021  ops/s
NettyZlibBenchmark.encode                   9     jdk             true  thrpt  150   60536.066 ± 101.270  ops/s

The benchmark was run on a silent server class machine (Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, Linux Kernel 4.2.0-34). It was pinned to core 0 with taskset -c 0 java -jar netty-zlib-0.1.0-all.jar -f 5 -wi 30 -i 30. I verified (in a separate trial run) with JMH's perf profiler that we had no CPU migrations. All cores ran with the performance CPU governor at 3.4GHz.

danielmitterdorfer on 29 Apr 2016

We have lost really a lot of time before fixing Painless really bad performance issues by enabling TCP compression ... we were consuming all our VMware bandwidth with complex aggregations on a 5 nodes cluster.

We have applied all performances guidelines, but no hint about compression.
https://www.elastic.co/guide/en/elasticsearch/reference/current/system-config.html

download