Elasticsearch: Unclear document description about 'search.max_buckets'

Created on 2 Oct 2018  ·  7Comments  ·  Source: elastic/elasticsearch

[The actual behavior]

  • When we change "search.max_buckets" setting from default to some value, (e.g 1000),
    the requests that try to return more than the limit buckets will occur error.

[Issue]

[Request]

  • Could you please make sure the document more understandable?
:AnalyticAggregations >docs feedback_needed

Most helpful comment

@saiergong the max_buckets parameter limits the maximum number of buckets at both a shard-level and a coordinatator-level. E.g. assuming max_buckets is set to 10,000:

  • If a shard generates more than 10,000 buckets during the course of processing the shard, the limit is triggered and an exception is thrown
  • If five shards all generate 9,999 buckets, they each individually are under the threshold. The per-shard results are sent to the coordinator. Assuming each shard had unique buckets which didn't exist on other shards, as soon as the coordinator starts merging the results the threshold will be breached and an exception is thrown
  • If five shards all generate 9,999 buckets and send those to the coordinator, but all 9,999 buckets are identical across all shards (meaning the total number of buckets generated globally is still 9,999), the threshold isn't breached and no exception is thrown.

Basically, if at any point Elasticsearch finds that there are more than max_buckets number of buckets sitting around, the exception is thrown.

Each agg handles bucketing a little differently. E.g. terms agg has a size parameter, but also a shard_size parameter. Those will affect how many buckets a shard tries to collect. histo/date_histo will collect all the buckets in a range, etc. Some aggs like rare_terms can generate many shard-level buckets but once merged on the coordinator it drops down to a small handful, etc.

All 7 comments

hi @dharada ,
there is a note in the page that you linked that says:

The maximum number of buckets allowed in a single response is limited by a dynamic cluster setting named search.max_buckets. It is disabled by default (-1) but requests that try to return more than 10,000 buckets (the default value for future versions) will log a deprecation warning.

Do you think that the above note explains the behaviour clearly enough?

Pinging @elastic/es-search-aggs

No further feedback received. @dharada if you have the requested
information please add it in a comment and we can look at re-opening
this issue.

The note does explain the warning. Is there some other place showing how to set the search.max_buckets?

@zorze search.max_buckets is a cluster setting, and is set in the same way as all other cluster settings.

No further feedback received. @dharada if you have the requested
information please add it in a comment and we can look at re-opening
this issue.

hi, @javanna , does this parameter limit the number of buckets returned or created during aggregation? for example,if one shard create 20000 term buckets during aggregation, but the user limit the return size to 10, will this parameter trigger?

@saiergong the max_buckets parameter limits the maximum number of buckets at both a shard-level and a coordinatator-level. E.g. assuming max_buckets is set to 10,000:

  • If a shard generates more than 10,000 buckets during the course of processing the shard, the limit is triggered and an exception is thrown
  • If five shards all generate 9,999 buckets, they each individually are under the threshold. The per-shard results are sent to the coordinator. Assuming each shard had unique buckets which didn't exist on other shards, as soon as the coordinator starts merging the results the threshold will be breached and an exception is thrown
  • If five shards all generate 9,999 buckets and send those to the coordinator, but all 9,999 buckets are identical across all shards (meaning the total number of buckets generated globally is still 9,999), the threshold isn't breached and no exception is thrown.

Basically, if at any point Elasticsearch finds that there are more than max_buckets number of buckets sitting around, the exception is thrown.

Each agg handles bucketing a little differently. E.g. terms agg has a size parameter, but also a shard_size parameter. Those will affect how many buckets a shard tries to collect. histo/date_histo will collect all the buckets in a range, etc. Some aggs like rare_terms can generate many shard-level buckets but once merged on the coordinator it drops down to a small handful, etc.

Was this page helpful?
0 / 5 - 0 ratings