We're seeing hundreds of cases of too many shards causing problems vs. problems caused by having too few.
It would be great to have a default hard limit, even though it can be increased (through the cluster settings API). It'll raise awareness to this issue hopefully in a "I can bump this now, but need to fix it"-way.
@alexbrasetvik I think this was done yesterday on #20682 ;)
@javanna That's different, that's per index but the request here is per cluster.
Just synced up with @s1monw who asked me to create this issue while we talked about the per-index limit. This one is indeed per cluster, as a total number of shards - whether it's a few indices with a lot of shards, or many single-shard indices.
sounds good thanks for clarifying.
This setting would be checked on user actions like create index, restore snapshot, open index. If the total number of shards in the cluster is greater than max_shards_per_node * number_of_nodes
then the user action can be rejected. This implementation allows the max value to be exceeded if (eg) a node fails, resulting in a lower total max shards per cluster.
We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.
We would default to a high number during 5.x (eg 1000), giving sysadmins the ability to set it to whatever makes sense for their cluster, and we can look at lowering this value for 6.0.
I would say that ~500/600 shards per node is a good limit.
@s1monw Raising this one to you.
Should the limit of shards per node not be linked to the amount of heap space a node has, e.g. 20 shard limit per GB of heap a node has allocated?
@cdahlqvist i like that idea!
Could someone explain the motivation for the shard limit per node? Is it related to the node type - the amount of memory it has? disk space? Anything else?
We have 40K shards (using per day indexes) and we're hitting issues of Large cluster states that we don't know how to resolve...
@ron-totango that's a question that's better suited to the support forums over at https://discuss.elastic.co - 40k shards sounds like too many, and the forums should be able to help you reduce it to something more reasonable.
Thanks @DaveCTurner . Already tried to ask at https://discuss.elastic.co/t/configuring-a-cluster-for-a-large-number-of-indexes/115731 but didn't get any meaningful reply :-(
When we are talking about the limit of shards per node (averaged through the cluster) are we considering primary or do replica shards count as well?
Pinging @elastic/es-distributed
This is currently labelled :Distributed/Allocation
but I think it's not a great idea to solve this in the allocator by refusing to allocate more than a certain number of shards per node. It seems like a better idea to check this on actions that create the shards-to-be-allocated:
This setting would be checked on user actions like create index, restore snapshot, open index.
I think, given the above comment, that this'd be better labelled :Core/Index APIs
, so I'm doing so.
Pinging @elastic/es-core-infra
We discussed this during the core/infra sync, we agreed that a limit is good, and that doing it at the validation layer is a good idea (rather than doing it at the allocation decider level). We agreed on Clint's proposal of making the limit a factor of the number of nodes. Marking this as adoptme and removing the discussion label now.
@dakrone What is the reason this will be based on the number of nodes rather than the available heap size? I would expect a 3-node 2GB Elastic Cloud cluster to need a much lower limit than a 3-node 64GB Elastic Cloud cluster.
@cdahlqvist That's a concern about what the default per node should be, not whether or not it should be based on the number of nodes. We will likely start simple with a blanket per node default and can consider over time making the default ergonomic to the heap size.
Will this include the number of replicas?
@cdekker The implementation merged in #34021 counts replicas towards the limit, as replicas consume resources in much the same way as primary shards.
An overall high shard count in cluster also loads up master node operations. Are there plans for a high overall limit for cluster irrespective of number of nodes? Or limit on number of nodes in the cluster?
Should master log warning/prevent index creation or add mappings if the heap on master is too low to support the cluster limit which IMO should also factor in heap on the data node as pointed out by @cdahlqvist
Most helpful comment
I would say that ~500/600 shards per node is a good limit.