Cloud-on-k8s: Cannot downscale from 3 to 1 mdi nodes in v7.0.0

Created on 24 Apr 2019  路  7Comments  路  Source: elastic/cloud-on-k8s

How to reproduce:

  1. deploy a 7.0.0 3-nodes cluster:
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: sample
spec:
  version: 7.0.0
  nodes:
    - config:
        node.master: true
        node.data: true
        node.ingest: true
        xpack.license.self_generated.type: trial
      resources:
        limits:
          memory: 1Gi
          cpu: 1
      nodeCount: 3
  1. wait for deployment to be complete
  2. downscale to 1 node
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: sample
spec:
  version: 7.0.0
  nodes:
    - config:
        node.master: true
        node.data: true
        node.ingest: true
        xpack.license.self_generated.type: trial
      resources:
        limits:
          memory: 1Gi
          cpu: 1
      nodeCount: 1
  1. the operator logs the following and never deletes any pod:
2019-04-24T20:10:32.773Z        ERROR   kubebuilder.controller  Reconciler error        {"controller": "elasticsearch-controller", "request": "default/sample", "error": "unable to add to voting_config_exclusions: 400 Bad Request: add voting config exclusions request for [sample-es-n2f7njwjml,sample-es-mlmkvkchmx] matched no master-eligible nodes", "errorCauses": [{"error": "unable to add to voting_config_exclusions: 400 Bad Request: unknown", "errorVerbose": "400 Bad Request: unknown\nunable to add to voting_config_exclusions\ngithub.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/client.(*clientV7).AddVotingConfigExclusions\n\t/go/src/github.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/client/v7.go:41\ngithub.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/version/version7.UpdateZen2Settings\n\t/go/src/github.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/version/version7/zen2.go:47\ngithub.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/driver/default.go:360\ngithub.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch.(*ReconcileElasticsearch).internalReconcile\n\t/go/src/github.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/elasticsearch_controller.go:279\ngithub.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch.(*ReconcileElasticsearch).Reconcile\n\t/go/src/github.com/elastic/k8s-operators/operators/pkg/controller/elasticsearch/elasticsearch_controller.go:228\ngithub.com/elastic/k8s-operators/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/elastic/k8s-operators/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/elastic/k8s-operators/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/elastic/k8s-operators/operators/vendor/sigs.k8s.io/controller-runtime/$kg/internal/controller/controller.go:158\ngithub.com/elastic/k8s-operators/operators/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/elastic/k8s-operators/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/elastic/k8s-operators/operators/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/elastic/k8s-operators/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/elastic/k8s-operators/operators/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/elastic/k8s-operators/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"}]}

Downscaling from 3 to 2 nodes seems to work fine.

>bug

Most helpful comment

I have created a issue in the elasticsearch project: https://github.com/elastic/elasticsearch/issues/41587

And submitted a PR: https://github.com/elastic/elasticsearch/pull/41588

All 7 comments

If it is not already done I will investigate that.

Still investigating the issue, just a quick update:

The problem seems to occur only when we want to add several masters to the exclusion list.
If I do it one by one everything is fine.

In the request the names of the nodes are concatenated with commas which seems to be supported according to the documentation:

Node filters are written as a comma-separated list of individual filters, each of which adds or removes nodes from the chosen subset. Each filter can be one of the following:
[...]

  • a node id or name, to add this node to the subset.

(https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster.html#cluster-nodes)

Unfortunately the nodes are not resolved by Elasticsearch:
add voting config exclusions request for [sample-es-n2f7njwjml,sample-es-mlmkvkchmx] matched no master-eligible nodes"

I have created a issue in the elasticsearch project: https://github.com/elastic/elasticsearch/issues/41587

And submitted a PR: https://github.com/elastic/elasticsearch/pull/41588

PR has been merged but we have to state what should be done until a version with the fix is released.
A workaround would be to add the nodes to the exclusion list one by one and when the fix is available revert back that change to fall back to the current behavior.

Do we have a test for this scenario?

The fix will be in 7.0.1 (due to a respin, see https://github.com/elastic/dev/issues/1189), to be released shortly, so just waiting a few days would be sufficient. I'm not sure it's necessary to create a workaround for 7.0.0 only.

Closing this one as we will only support versions that include the fix.
@ywelsch thanks a lot for your quick feedback and your help !

Was this page helpful?
0 / 5 - 0 ratings