Elasticsearch: Limit index creation rate

Created on 5 Oct 2016 · 7Comments · Source: elastic/elasticsearch

5.0 introduces a per-node limit on the rate of inline script compilations that should help catch the anti-pattern of embedding script parameters in the scripts themselves. I wonder if it is worth adding a master-only limit on the rate of indexes created to catch situations where people accidentally misconfigure an input system and it ends up creating thousands of indexes in quick succession. Such a rate limit would cause indexing to fail with a useful error message, causing back pressure in any queueing system. I think this'd be better than just creating thousands of indexes as fast as we can.

Is this a good idea or a horrible idea?

:DistributeDistributed >enhancement Distributed help wanted team-discuss

Source

nik9000

Most helpful comment

I think I'd rather limit the total number of indices/shards in a cluster than the creation rate.

jpountz on 5 Oct 2016

👍5

All 7 comments

Have we run into any situations where someone has actually hit this issue? I don't recall seeing any github issues about it before.

dakrone on 5 Oct 2016

I don't recall seeing any github issues about it before.

I've seen it come through Elastic's support organization a few times. I expect this hasn't come up on github because Elasticsearch isn't the root cause of the issue.

nik9000 on 5 Oct 2016

I think I'd rather limit the total number of indices/shards in a cluster than the creation rate.

jpountz on 5 Oct 2016

👍5

Discussed in FixItFriday. There are two issues here: creating too many indices and creating indices faster than the master can cope. We suggest adding two safeguards:

max_shards_per_node

This setting would be checked on user actions like create index, restore snapshot, open index. If the total number of shards in the cluster is greater than max_shards_per_node * number_of_nodes then the user action can be rejected. This implementation allows the max value to be exceeded if (eg) a node fails, resulting in a lower total max shards per cluster.

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

max_concurrent_index_creations

This would be a simple counter which counts the number of in-flight index creation requests. New requests which would cause the max to be exceeded would be rejected. The aim of this setting is not to queue up potentially thousands of index creations which could be caused by erroneously trying to create an index per document. Default eg 30

clintongormley on 7 Oct 2016

The max_shards_per_node change will be handled in https://github.com/elastic/elasticsearch/issues/20705

clintongormley on 7 Oct 2016

We now have a limit on the number of shards per node in a cluster, thanks to #34892.

I've marked this as team-discuss because I would like to revisit the discussion about limiting the number of concurrent index creations, or applying another rate limit. I question how easy it would be to set this correctly. If using time-based indices then sometimes we might want to create many indices at the same time. Conversely, even if you could only create a single index at once I think the time it'd take to hit the shards-per-node limit is comparable with the time it'd take to react to a rogue client that's creating too many indices, so I don't think the concurrency limit helps much.

In short, I think we can close this.