Elasticsearch version (bin/elasticsearch --version): 7.1, 7.4 and likely other versions
Description of the problem including expected versus actual behavior:
Allocation filtering works only on delta settings - ie. old filter attributes of the same family(include/exclude/require) are not honored
This bug occurs because FilterAllocationDecider is recreating its include/exclude/require filters from the settings update consumer. The AffixMapUpdateConsumer is only passing setting deltas
So with 2 independent exclusions, we end up having a cluster where /_cluster/settings has multiple attributes for shard exclusion but the decider acts only on few of them which were part of the last exclude delta
This does not look like expected behavior.
Till we have a fix, I would also like to get your thoughts on updating https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-filtering.html to indicate such cases
Steps to reproduce:
curl -H "Content-Type: application/json" -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude.zone" : "us-east-1a"
}
}'
If this is the only excluded attribute, everything works as expected
curl -H "Content-Type: application/json" -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude.rack" : "rack-id"
}
}'
"cluster.routing.allocation.exclude.rack" : "rack-id"
"cluster.routing.allocation.exclude.zone" : "us-east-1a"
But because the delta on second update was only for rack, shards will start relocating into the excluded us-east-1a zone
Pinging @elastic/es-core-infra (:Core/Infra/Settings)
I confirmed this affects versions all the way back to 6.1.0 (6.0.0 is unaffected) so possibly https://github.com/elastic/elasticsearch/pull/26819 is related?
I think this is a duplicate of the issue that @sohami was experiencing at https://github.com/elastic/elasticsearch/issues/55764, see https://discuss.elastic.co/t/filterallocationdecider-routing-exclude-setting-behavior/229800 for further discussion on that subject. Thanks to both @sohami and @malpani for reporting this.
Most helpful comment
I confirmed this affects versions all the way back to 6.1.0 (6.0.0 is unaffected) so possibly https://github.com/elastic/elasticsearch/pull/26819 is related?
I think this is a duplicate of the issue that @sohami was experiencing at https://github.com/elastic/elasticsearch/issues/55764, see https://discuss.elastic.co/t/filterallocationdecider-routing-exclude-setting-behavior/229800 for further discussion on that subject. Thanks to both @sohami and @malpani for reporting this.