With an index configured with auto_expand_replicas="0-all" the cluster will try to allocate one primary and ({number nodes} - 1) replicas for each shard, ie a copy of each shard on each node.
However, with "index.routing.allocation.include.zone"="zone1" the cluster is blocked from allocating a shard (either primary or replica) to any nodes that are not configured with a zone attribute set to "zone1", ie "node.zone=zone1" in the elasticsearch.yaml config file. So if a cluster has 3 nodes in "zone1" and 3 nodes in "zone2" it will allocate 3 shards (primary + 2 x replicas) to the "zone1" nodes and mark an additional 3 replicas as unassigned. I observed this using the elasticsearch head utility.
So the allocation behaviour is as desired, ie auto expand replicas to all nodes with a specific zone attribute, but the unassigned shards will make the cluster be in red state
See https://groups.google.com/forum/#!msg/elasticsearch/95hC-wGu7GE/BPPSWsfj8UkJ
using latest master I should add
I'm wondering if we should deprecate auto-expand... It just seems like the wrong solution
IMO no, auto-expand is a very useful feature and a way for clusters to automatically grow to accommodate peaks with read-heavy operations. It's just somewhat broken when shard-allocation rules aren't trivial.
I admit tho, auto-expand does require multi-cast to be enabled (or predefining IPs for the machines dynamically provisioned during peaks), so at this point it does seem more like of an exotic feature.
We used it in CirrusSearch so people with only a single node or two nodes
don't end up with a yellow cluster. We document the autocorrelating
behavior and the implications to redundancy. I still don't particularly
trust it BUT it is a nice way to make installation smooth even in small
environments. We use most of the CirrusSearch details both in development
and on the production cluster and it helps with that.
On Aug 8, 2014 12:00 PM, "Itamar Syn-Hershko" [email protected]
wrote:
IMO no, auto-expand is a very useful feature and a way for clusters to
automatically grow to accommodate peaks with read-heavy operations. It's
just somewhat broken when shard-allocation rules aren't trivial.I admit tho, auto-expand does require multi-cast to be enabled (or
predefining IPs for the machines dynamically provisioned during peaks), so
at this point it does seem more like of an exotic feature.—
Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/issues/2869#issuecomment-51588189
.
Attached an "adoptme" tag as this needs some further looking into to establish if this is an issue.
The auto-expand feature is used internally to keep a copy of the .scripts index (stored scripts) on each node so this does need to work.
Having reread this issue, it seems that if we were able to limit the max number of auto-expand replicas to the number of nodes that would allow allocation (eg awareness rules) then we'd avoid the yellow status.
Not sure how feasible this is.
@dakrone Do you know whether @clintongormley suggestion above is feasible?
@colings86 @clintongormley I looked at where the setting is updated, it's
currently updated in MetaDataUpdateSettingsService, which submits a new
cluster state update to change the number of replicas if it detects that the
setting must be changed.
I think it would be possible to try and use the AllocationDeciders to adjust the
number of replicas to the number of nodes where it can be allocated, but I don't
think it's trivial.
Maybe it should be a separate setting? Instead of 0-all it can be
0-available or something?
I noticed that in ES1.7
.scripts index has the 0-all setting automatically
My cluster is yellow because it is unable to expand replicas to nodes that are full in data capacity bounded by
"cluster.routing.allocation.disk.watermark.low": "80%",
"cluster.routing.allocation.disk.watermark.high": "85%"
is there a solution for this?
We have run into this issue (or a heavily related one) twice in the last month on Cloud, both times on 2.x clusters (#1656, details here and here in the comments)
In particular it _appears_ (it's happening on production clusters so unfortunately there's a limit to how much debugging we can do before it is necessary to reset the cluster state) that an unallocated "auto expand" index is preventing other indexes from migrating given a cluster.routing.allocation.exclude._name directive - is that even remotely possible? (here is my notes on what I saw the most recent time) ...
We are about to deploy a workaround that will temporarily disable "auto expand" for .scripts and .security while updating a cluster (and perhaps user indexes in the future), but .security settings cannot be changed in 2.x so this workaround is incomplete.
cc @alexbrasetvik
we spoke about this in fixit friday to keep this issue going. We came up with a possible different way of looking at the problem. If we'd treat an index that has n-all set we could just consider it fully allocated as long as at least max(1,n) replicas are allocated. Since this is all the allocator can do at this point. We haven't fully flashed out the consequences and how hard it would be to do that but it might be a simpler solution independent of the allocation deciders. /cc @ywelsch
Pinging @elastic/es-distributed
This issue can also manifest when decommissioning a data node where, for example, the node is holding a .security-6 copy and cluster level allocation filtering has been applied by _ip. Applying shard allocation filtering does not remove shards having auto-expand configured. While this is not a detrimental problem, it can cause confusion and it should be verified that any remaining shards on a node are due to having auto-expand replicas configured. It also requires the user to know which indices have auto-expand configured.
Perhaps shard allocation settings + shard allocation behavior with auto-expand replicas should be documented until a technical resolution is reached given how long this issue has been open.
Perhaps shard allocation settings + shard allocation behavior with auto-expand replicas should be documented until a technical resolution is reached given how long this issue has been open.
@inqueue I've opened #30531 to document this.
Most helpful comment
we spoke about this in fixit friday to keep this issue going. We came up with a possible different way of looking at the problem. If we'd treat an index that has
n-allset we could just consider it fully allocated as long as at leastmax(1,n)replicas are allocated. Since this is all the allocator can do at this point. We haven't fully flashed out the consequences and how hard it would be to do that but it might be a simpler solution independent of the allocation deciders. /cc @ywelsch