Elasticsearch: Hard limit on total number of shards in a cluster

Created on 30 Sep 2016 · 23Comments · Source: elastic/elasticsearch

We're seeing hundreds of cases of too many shards causing problems vs. problems caused by having too few.

It would be great to have a default hard limit, even though it can be increased (through the cluster settings API). It'll raise awareness to this issue hopefully in a "I can bump this now, but need to fix it"-way.

:CorFeatureIndices APIs >enhancement good first issue

Source

alexbrasetvik

👍7

Most helpful comment

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

I would say that ~500/600 shards per node is a good limit.

gmoskovicz on 24 Apr 2017

❤8

All 23 comments

@alexbrasetvik I think this was done yesterday on #20682 ;)

javanna on 30 Sep 2016

@javanna That's different, that's per index but the request here is per cluster.

jasontedor on 30 Sep 2016

Just synced up with @s1monw who asked me to create this issue while we talked about the per-index limit. This one is indeed per cluster, as a total number of shards - whether it's a few indices with a lot of shards, or many single-shard indices.

alexbrasetvik on 30 Sep 2016

sounds good thanks for clarifying.

javanna on 30 Sep 2016

max_shards_per_node

This setting would be checked on user actions like create index, restore snapshot, open index. If the total number of shards in the cluster is greater than max_shards_per_node * number_of_nodes then the user action can be rejected. This implementation allows the max value to be exceeded if (eg) a node fails, resulting in a lower total max shards per cluster.

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

clintongormley on 7 Oct 2016

👍5

We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.

I would say that ~500/600 shards per node is a good limit.

gmoskovicz on 24 Apr 2017

❤8

@s1monw Raising this one to you.

jasontedor on 12 May 2017

❤3

Should the limit of shards per node not be linked to the amount of heap space a node has, e.g. 20 shard limit per GB of heap a node has allocated?

cdahlqvist on 23 Jun 2017

👍2

@cdahlqvist i like that idea!

gmoskovicz on 23 Jun 2017

Could someone explain the motivation for the shard limit per node? Is it related to the node type - the amount of memory it has? disk space? Anything else?
We have 40K shards (using per day indexes) and we're hitting issues of Large cluster states that we don't know how to resolve...

ron-totango on 16 Jan 2018

@ron-totango that's a question that's better suited to the support forums over at https://discuss.elastic.co - 40k shards sounds like too many, and the forums should be able to help you reduce it to something more reasonable.

DaveCTurner on 17 Jan 2018

Thanks @DaveCTurner . Already tried to ask at https://discuss.elastic.co/t/configuring-a-cluster-for-a-large-number-of-indexes/115731 but didn't get any meaningful reply :-(

ron-totango on 17 Jan 2018

When we are talking about the limit of shards per node (averaged through the cluster) are we considering primary or do replica shards count as well?

majormoses on 7 Feb 2018

Pinging @elastic/es-distributed

elasticmachine on 27 Mar 2018

This is currently labelled :Distributed/Allocation but I think it's not a great idea to solve this in the allocator by refusing to allocate more than a certain number of shards per node. It seems like a better idea to check this on actions that create the shards-to-be-allocated:

This setting would be checked on user actions like create index, restore snapshot, open index.

I think, given the above comment, that this'd be better labelled :Core/Index APIs, so I'm doing so.

DaveCTurner on 30 Apr 2018

Pinging @elastic/es-core-infra

elasticmachine on 30 Apr 2018

We discussed this during the core/infra sync, we agreed that a limit is good, and that doing it at the validation layer is a good idea (rather than doing it at the allocation decider level). We agreed on Clint's proposal of making the limit a factor of the number of nodes. Marking this as adoptme and removing the discussion label now.

dakrone on 25 Jul 2018

🎉1

@dakrone What is the reason this will be based on the number of nodes rather than the available heap size? I would expect a 3-node 2GB Elastic Cloud cluster to need a much lower limit than a 3-node 64GB Elastic Cloud cluster.

cdahlqvist on 26 Jul 2018

@cdahlqvist That's a concern about what the default per node should be, not whether or not it should be based on the number of nodes. We will likely start simple with a blanket per node default and can consider over time making the default ergonomic to the heap size.

jasontedor on 27 Jul 2018

👍1

Will this include the number of replicas?

cdekker on 27 Jul 2018

@cdekker The implementation merged in #34021 counts replicas towards the limit, as replicas consume resources in much the same way as primary shards.

gwbrown on 24 Oct 2018

An overall high shard count in cluster also loads up master node operations. Are there plans for a high overall limit for cluster irrespective of number of nodes? Or limit on number of nodes in the cluster?

vigyasharma on 15 Feb 2019

👍2

Should master log warning/prevent index creation or add mappings if the heap on master is too low to support the cluster limit which IMO should also factor in heap on the data node as pointed out by @cdahlqvist