Elasticsearch: RestClient does not support transparent content decompression

Created on 26 Apr 2017  路  11Comments  路  Source: elastic/elasticsearch

Currently it is possible to set a default header in the RestClient that indicates that all response should be compressed:
"Accept-Encoding": "gzip,deflate"

Though the underlying async client that the RestClient uses does not handle compressed content and instead will return a compressed entity. This means that users of RestClient must check the response entity and wrap it with a decompressor if they want to be able to handle compressed responses.
Similarly it is not possible to use compression in reindex from remote since the response entities are not decompressed by the rest client directly.
I don't know if the RestClient should do that automatically but we should be able to use compression in reindex even if this means adding an explicit decompressor in reindex.

:CorFeatureJava Low Level REST Client >enhancement

Most helpful comment

sometimes getting large data set need much more network time, I think this is a high priority feature we needed

All 11 comments

The lack of transparent content decompression was one of the points brought up back when we decided to use apache async http client. See #19301 . It is true that you can set the header yourself, but usually http clients set it themselves and automatically decompress content, otherwise users are on their own if they specify it manually. I am not even sure if it's technically possible to add support for this given that the underlying client doesn't do it for us, see https://issues.apache.org/jira/browse/HTTPCLIENT-1822 .

Thanks for the explanation @javanna
So this means that reindex should check for the response header and adds a decompressing entity if the content is compressed. @nik9000 WDYT ?

So this means that reindex should check for the response header and adds a decompressing entity if the content is compressed.

If reindex can do it, we could also add something like that to the client, at least for users to reuse when they set the header themselves. Would be nice to benchmark the difference for reindex.

I am not even sure if it's technically possible to add support for this given that the underlying client doesn't do it for us, see https://issues.apache.org/jira/browse/HTTPCLIENT-1822 .

I think the "obvious reasons" that the client doesn't do it for us ;) is that it depends on the async response consumer that you use. The RestClient is async but only for the completion of the request so callers only see the final version of the response (not the chunked one). So we could just wrap the final input stream with a decompressor (if needed) directly in the RestClient. Though this would only work if you consume the final input stream of the response. Users that re-implement the async response consumer (to write the response into a file for instance) would only see the compressed content but that shouldn't be an issue.

Would be nice to benchmark the difference for reindex.

I think the main benefit would be the reduced bandwidth between the two clusters when reindex from remote is used.

The java Rest client should also have an option to compress request sent to elastic search node.

As a side note:

Transparent content decompression can now be implemented using the new async request exec interceptor API. Volunteers welcome.
https://github.com/apache/httpcomponents-client/blob/5.0-beta1/httpclient5/src/examples/org/apache/hc/client5/http/examples/AsyncClientMessageTrailers.java
Oleg

sometimes getting large data set need much more network time, I think this is a high priority feature we needed

I could definitely use this. I have some larger search response payload sizes in the 100's of MB. I tested on a 5.6 and 6.8 cluster with respective versions of the java RestHighLevelClient and neither supported compression. I was setting the "Accept-Encoding" header on my requests. I also tested with a golang client that had compression on and was able to see a nearly 10x improvement in response times. Is supporting compression in the RestHighLevelClient on the roadmap for a release anytime soon? Thanks!

This is basically a must have for any decent use of scrolling documents using a Java client. @javanna based on your comment above it sounds like this is a regression due to the absence of decompression support with the async client? From https://issues.apache.org/jira/browse/HTTPCLIENT-1822 it sounds like one could implement a workaround using an Interceptor, do you have an example that could benefit anyone doing scrolls in Java?

Hi guys, I also came to a point where a compressed data would be handy because of the big requests and responses. Unfortunately the rest high level client can't handle the response. I did an attempt to add this possibility to handle a compressed response from Elasticsearch within the rest high level client, see the pull request: https://github.com/elastic/elasticsearch/pull/53533

Closed by #53533 , thanks @Hakky54 !

Was this page helpful?
0 / 5 - 0 ratings