Elasticsearch: Reported hits count are inconsistent between _search and _search/template

Created on 26 Feb 2020  路  7Comments  路  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version): 7.5.0

Plugins installed: []

JVM version (java -version): Elastic Cloud

OS version (uname -a if on a Unix-like system): Elastic Cloud

Description of the problem including expected versus actual behavior:

Since ES 7, one must use rest_total_hits_as_int=true in order to revert to the old behavior of getting an exact number of total hits in the search response. I feel there is a discrepancy in how the search and _search/template endpoints behave regarding the reported number of hits.

In my tests below, I'm querying an index with more than 10000 documents with the exact same JSON query (as a normal query and as a template query depending on which endpoint I'm targeting).

{
  "query": {
    "match_all": {}
  }
}

A. When using the _search endpoint, I get this:

"total" : {
  "value" : 10000,
  "relation" : "gte"
},

B. When using the _search?rest_total_hits_as_int=true endpoint, I get this:

"total" : 173175,

C. When using the _search/template endpoint, I get this:

"total" : {
  "value" : 10000,
  "relation" : "gte"
},

So far, so good, everything is consistent.

D. But when I hit the _search/template?rest_total_hits_as_int=true endpoint, I get this:

"total" : 10000,

The only way I found to get the exact total with the _search/template endpoint is by adding the "track_total_hits": true parameter to the template query.

E. When doing so, I get this when hitting the _search/template endpoint

"total" : {
  "value" : 173175,
  "relation" : "eq"
},

F. and this when when hitting the _search/template?rest_total_hits_as_int=true endpoint

"total" : 173175,

There are two take-aways here:

  1. Since A and C are consistent, I feel that B and D should also be consistent.
  2. I also think that B is wrong and should require "track_total_hits": true in the query in order to spit out the exact number of hits (like in cases E and F)

Steps to reproduce:

It's easy to reproduce this on any index that has more than 10K documents and creating a simple match_all template query.

:SearcSearch >bug

Most helpful comment

@jimczi OK, I'm glad to do that. @consulthys, only D is incorrect, when rest_total_hits_as_int is set to true, the total hits count should be accurate.

All 7 comments

Pinging @elastic/es-search (:Search/Search)

From the source code I found that when rest_total_hits_as_int is set to true in _search api(like B), trackTotalHitsUpTo is set to Integer.MAX_VALUE, so we can only get the accurate hits count. But in _search/tempate api(like D), the value of trackTotalHitsUpTo is lost so we get 10000. So the result of D is incorrect I think.
https://github.com/elastic/elasticsearch/blob/1e0ba70fa7776c094e75d5ac99afee201fa1840c/server/src/main/java/org/elasticsearch/rest/action/search/RestSearchAction.java#L303

It's lost because the templated search parses the _source late in the action. We should check if trackTotalHits is set before parsing and throw an error if the template search tries to lower it (set to false or to a number). Since you already started to look @gaobinlong , would you be interested in providing a pull request ?

Thanks @gaobinlong and @jimczi for looking into this.
I'm also interested to know which of A-F is supposed to be the correct intended behavior.

I'm also interested to know which of A-F is supposed to be the correct intended behavior.

Yes sorry, the expectation when setting rest_total_hits_as_int is that the total number of hits ix tracked accurately since the rest response will return hits.total as a numeric value (as opposed to an object in the new format). So D is a bug, the default for track_total_hits when rest_total_hits_as_int is set should be to track the number of hits accurately. E and F is a correct workaround but it shouldn't be needed if we fix D.

Thank @jimczi so when specifying rest_total_hits_as_int=true one wouldn't have to also specify track_total_hits: true. That makes sense.

@jimczi OK, I'm glad to do that. @consulthys, only D is incorrect, when rest_total_hits_as_int is set to true, the total hits count should be accurate.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ttaranov picture ttaranov  路  3Comments

abrahamduran picture abrahamduran  路  3Comments

clintongormley picture clintongormley  路  3Comments

DhairyashilBhosale picture DhairyashilBhosale  路  3Comments

jasontedor picture jasontedor  路  3Comments