Elasticsearch: Using size parameter - Terms Aggregations on 5.x

Created on 13 Dec 2016  Â·  2Comments  Â·  Source: elastic/elasticsearch

On 2.x versions i used to apply size: 0 to achieve what the docs say:

If set to 0, the size will be set to Integer.MAX_VALUE.

And the terms aggregation gave me as many result buckets as terms i had.

But on 5.x i'm getting size must be positive, got 0 as error. If i omit the value, i only get 10 results back. The docs says:

By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client.

So it's not clear how to achieve the same and get not only 10 buckets but all instead.

How i do that? Thanks

Most helpful comment

You can check here why we decided to remove the size:0 option:
https://github.com/elastic/elasticsearch/issues/18838

Having size: 0 as an option makes it look like there is a short cut here and we can do the give me all the buckets case in a different very efficient way, which we can't. Internally we rewrite size:0 to Integer.MAX_VALUE

So you can just specify a size that is bigger than the cardinality of your field. If the cardinality is big you should consider other options since returning millions of terms is going to cause problem in your cluster.

Also we reserve github for bugs and feature requests so the best way to get an answer for questions like this is to use the discuss forum:
https://discuss.elastic.co/c/elasticsearch

All 2 comments

You can check here why we decided to remove the size:0 option:
https://github.com/elastic/elasticsearch/issues/18838

Having size: 0 as an option makes it look like there is a short cut here and we can do the give me all the buckets case in a different very efficient way, which we can't. Internally we rewrite size:0 to Integer.MAX_VALUE

So you can just specify a size that is bigger than the cardinality of your field. If the cardinality is big you should consider other options since returning millions of terms is going to cause problem in your cluster.

Also we reserve github for bugs and feature requests so the best way to get an answer for questions like this is to use the discuss forum:
https://discuss.elastic.co/c/elasticsearch

Using size parameter for Query and Terms Aggregations on 5.6.x

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "mid": [
              185422,
              13446728
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "group_mids": {
      "terms": {
        "field": "mid",
        "size": 5,
        "shard_size": 10, 
        "order": {
          "max_hot": "desc"
        }
      },
      "aggs": {
        "max_hot": {
          "max": {
            "field": "hotvalue"
          }
        }
      }
    }
  },
  "_source": {
    "includes": [
      "mid",
      "cid",
      "hotvalue"
    ]
  },
  "sort": [
    {
      "lastreptime": "desc"
    }
  ],
  "from": 0,
  "size": 10
}

But every buckets do not match the size = 5, Why ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ttaranov picture ttaranov  Â·  3Comments

clintongormley picture clintongormley  Â·  3Comments

ppf2 picture ppf2  Â·  3Comments

dawi picture dawi  Â·  3Comments

makeyang picture makeyang  Â·  3Comments