Elasticsearch: Extend RequestOptions with query params

Created on 20 Mar 2020  路  9Comments  路  Source: elastic/elasticsearch

Describe the feature:
_Support for setting url query paramters within the RequestOptions object_

Recently I came to a point where I wanted a smaller search response. The reason why I wanted it was because of network latency, big search responses and not needed fields within the search response. By discarding specific fields the response would be smaller measured in kb.

This option is unfortunately not possible with the RestHighLevelClient and also not possible to add as a property to a SearchRequest. And adding that option to the SearchRequest would mean a change for the specification of the SearchRequest. But I have the feeling that something like that is not a part of the search request.

If we grab an example which I have used at kibana, it would look like this:

GET twitter/_search?filter_path=-hits.hits._index
{
  "query": {
    "match_all": {}
  }
}

Lets translate this request to a basic search request with the RestHighLevelClient (without filter path):

RestClientBuilder clientBuilder = RestClient.builder(new HttpHost("localhost", 9200));
RestHighLevelClient client = new RestHighLevelClient(clientBuilder);

SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder()
    .query(new MatchAllQueryBuilder());
searchRequest.source(sourceBuilder)
    .indices("twitter");

SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

Let's translate this request to a request to execute with the LowLevelClient including filter_path while preserving the capability of sending a SearchRequest and getting back a SearchResponse

RestClientBuilder clientBuilder = RestClient.builder(new HttpHost("localhost", 9200));
RestHighLevelClient client = new RestHighLevelClient(clientBuilder);

SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder()
    .query(new MatchAllQueryBuilder());
searchRequest.source(sourceBuilder)
    .indices("twitter");

Request request = new Request(HttpPost.METHOD_NAME, "/" + String.join(",", searchRequest.indices()) + "/_search");
request.setOptions(RequestOptions.DEFAULT);
request.addParameter("filter_path","-hits.hits._index");

HttpEntity httpEntity = new NStringEntity(searchRequest.source().toString(), ContentType.APPLICATION_JSON);
request.setEntity(httpEntity);

Response response = client.getLowLevelClient().performRequest(request)

List<NamedXContentRegistry.Entry> entries = new ArrayList<>();
entries.add(new NamedXContentRegistry.Entry(Aggregation.class, new ParseField(StringTerms.NAME), (parser, content) -> ParsedStringTerms.fromXContent(parser, (String) content)));
entries.add(new NamedXContentRegistry.Entry(Aggregation.class, new ParseField(TopHitsAggregationBuilder.NAME), (parser, content) -> ParsedTopHits.fromXContent(parser, (String) content)));
entries.add(new NamedXContentRegistry.Entry(Suggest.Suggestion.class, new ParseField("term"), (parser, content) -> TermSuggestion.fromXContent(parser, (String) content)));
entries.add(new NamedXContentRegistry.Entry(Suggest.Suggestion.class, new ParseField("phrase"), (parser, content) -> PhraseSuggestion.fromXContent(parser, (String) content)));

NamedXContentRegistry namedXContentRegistry = new NamedXContentRegistry(entries);

String content = EntityUtils.toString(response.getEntity());
XContentParser parser = JsonXContent.jsonXContent.createParser(namedXContentRegistry, null, content);
SearchResponse searchResponse = SearchResponse.fromXContent(parser);

This would return me all documents from the twitter index without the index name within the hits.
As you can see there is a lot code needed to just use the filter_path while maintaining the SearchRequest and SearchResponse at a highlevel of our code base.

It would be great if there would be support or a new feature to enable RequestOptions to also support query parameter. Now the RequestOptions just only supports headers. Request options with query parameters could look like this:

RequestOptions.DEFAULT.toBuilder()
    .addParameter("filter_path","-hits.hits._index");
:CorFeatureJava High Level REST Client CorFeatures

Most helpful comment

BTW your feedback on the use of the SMILE encoding is interesting. Have you measured the gains brought by this encoding before enabling compression?

@swallez Yes we have measured it. For our test we used a dataset of 100.000 documents and max result windows of 10.000. The query included also highlights. See below the test results with the different configuration:
Screenshot 2020-11-18 at 16 49 13

All 9 comments

We have discussed this in the past, and the conclusion at the time was that we can't support filter_path in the high level REST client, as it means supporting response bodies where arbitrary sections are filtered out. That would change most of the assumptions we make in our parsing code, which results in breaking most of our parsing code. I would recommend to use low-level REST client if you want smaller responses obtained through filter_path.

Pinging @elastic/es-core-features (:Core/Features/Java High Level REST Client)

Based on a discussion on SOF, we do recommend in the Bulk REST API documentation to use filter_path. That's the only way to reduce the network traffic for the BULK API which can be network consuming, response wise.

To return only information about failed operations, use the filter_path query parameter with an argument of items.*.error.

If we can't support filter_path in the HLClient, could we think of adding an option in the Java Client Bulk API to actually ignore the successful items? I can open another issue for that.

@javanna @dadoonet @rjernst

Hi guys,

I was wondering what the status is of this issue. In my use case we applied a different kind solution which also reduces the response size to reduce the response latency. We have used smile encoding instead of JSON and also applied http compression. So I wanted to tell that this solution works well and that filter_path is not needed anymore. So it is a nice to have feature but not a must have anymore... I was wondering if we should close this issue or still keep it open?

ping @swallez ;)

Thanks for the ping @javanna :-)

@Hakky54
Explicit support for filter_path is indeed not something we want to do, as the consequence is that every response field would become optional, which would impact both data structures (e.g. all int fields have to become boxed Integers) and higher level logic built on top of these structures that could then fail with NullPointerException.

That being said, adding request params in RequestOptions sounds reasonable as it's meant for more low-level advanced use cases, i.e. "danger zone, enter at your own risk".

BTW your feedback on the use of the SMILE encoding is interesting. Have you measured the gains brought by this encoding before enabling compression?

@dadoonet

If we can't support filter_path in the HLClient, could we think of adding an option in the Java Client Bulk API to actually ignore the successful items? I can open another issue for that.

Please do. Note though that ignoring sucessful items with filter_path=items.*.error would strip other fields of error items that may be of interest, but we can't do better unless the bulk API has something like an errors_only query parameter.

BTW your feedback on the use of the SMILE encoding is interesting. Have you measured the gains brought by this encoding before enabling compression?

@swallez Yes we have measured it. For our test we used a dataset of 100.000 documents and max result windows of 10.000. The query included also highlights. See below the test results with the different configuration:
Screenshot 2020-11-18 at 16 49 13

@Hakky54 very interesting, thanks! So basically after gzipping Smile loses some of its size benefits, but still uses half of the memory compared to JSON.

@swallez yes thats correct! gzip versus (smile + gzip) saves only 30 kb in our case but it reduced the memory usage by half.

The downside of getting a big response in json format is that the size is also big, as you can see 15MB. We initially wanted to make it smaller by removing fields which we didn't needed such as hits.hits._index and that shaved off 5MB resulting in a response of 10MB. Compressing this with gzip was the next step and that saved a lot but the memory usage was still high.

We discovered that Elasticsearch also supports smile encoding, after some research we wanted to try it out. During our tests we noticed that smile encoding was already reducing the response size by half compared to json. But the memory usage was also smaller. Combining this with gzip compression we decided that this would be the ideal setup without breaking anything. Using filter path is dangerous indeed, so we wanted to avoid it as much as possible, so we decided to go with Smile + gzip

By the way we used compression level 3, during the test between compression level from 1 to 10 we also discovered that level 3 was on average the best

Was this page helpful?
0 / 5 - 0 ratings