Cloud-on-k8s: Do not try to exclude a master node that never existed

Created on 14 Oct 2019 · 9Comments · Source: elastic/cloud-on-k8s

I ran out of resources on a K8S cluster while doing an upscale of a set of MDI nodes.

I'm now in a situation where the nodeSet can't be downscaled because the operator is trying to exclude a master node which has never existed:

pod/es-apm-sample-es-1-0                    1/1     Running   0          33m    10.56.3.18    gke-michael-dev-cluster-default-pool-8a982915-3msd   <none>           <none>
pod/es-apm-sample-es-1-1                    0/1     Pending   0          26m    <none>        <none>

2019-10-14T07:30:50.277Z    ERROR   controller-runtime.controller   Reconciler error
{
    "ver": "1.0.0-beta1-bc11-c8bb5e5b",
    "controller": "elasticsearch-controller",
    "request": "default/es-apm-sample",
    "error": "unable to add to voting_config_exclusions: 400 Bad Request: add voting config exclusions request for [es-apm-sample-es-1-1] matched no master-eligible nodes",
    "errorCauses": [{
        "error": "unable to add to voting_config_exclusions: 400 Bad Request: unknown",
        "errorVerbose": "400 Bad Request: unknown
unable to add to voting_config_exclusions
github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/client.(*clientV7).AddVotingConfigExclusions\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/client/v7.go:41\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/version/zen2.AddToVotingConfigExclusions\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/version/zen2/voting_exclusions.go:34\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.updateZenSettingsForDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:237\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.doDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:198\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.attemptDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:129\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.HandleDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:54\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).reconcileNodeSpecs\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/nodes.go:112\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/driver.go:234\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch.(*ReconcileElasticsearch).internalReconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:284\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch.(*ReconcileElasticsearch).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
    }]
}

>bug

Source

barkbay

Most helpful comment

I opened https://github.com/elastic/elasticsearch/issues/47990

DaveCTurner on 14 Oct 2019

❤3

All 9 comments

This is an interesting one.

Some ideas:

Ignore the error when it happens and move on with removing the master node.
Check if the master node is part of the cluster before we add it to voting_config_exclusions. Don't make the voting_config_exclusions call if the node is not part of the cluster.

In both cases, there's a race condition:

we don't add the master node to voting_config_exclusions (either from 1. or 2.)
the node joins the cluster <-- the operator does not notice yet
we remove the node, but it was not excluded from voting

The current code retries over and over again, hitting the same error until the node finally joins the cluster. But this could never happen if the node stays Pending or bootlooping forever.
We can detect a Pending or bootlooping Node, but we would still end up with the same race condition as above.

Note we do remove only one master node at a time, which mitigates the risks introduced with the above race condition.

I think we have the same sort of problem when setting allocation excludes in cluster settings, to migrate shards away from a data node before removing it. We have an easy way out though: it is possible to exclude a node that is not part of the cluster. The corresponding HTTP call does not fail.

@ywelsch @DaveCTurner I would appreciate your thoughts on this.

sebgl on 14 Oct 2019

Ugh yes this is tricky.

Unfortunately it's necessary to know the node ID (not just its name) before we can exclude it from the voting configuration. If it's not in the cluster we don't know its node ID so we cannot exclude it, hence the exception.

Naively, if a node is not running then you don't need to play with the voting configuration to get rid of it safely. If the cluster is alive then the node in question wasn't needed for its votes, and if the cluster is dead then it's already too late. The main thing that worries me is that this node is still showing as Pending which suggests to me that it might come to life at some point in the future. If we knew it would certainly never start then life would be easier. Is that possible?

Unfortunately, "will not run in future" isn't quite enough. Nodes that are not running cannot join a cluster, but they could remain in a cluster for a short while after their deaths. I think that after stopping the node from running we need to ensure it is certainly out of the cluster. I don't think we provide an API to do this today.

I wonder if we should strengthen the voting config exclusions API to accept an unknown node name.

DaveCTurner on 14 Oct 2019

I opened https://github.com/elastic/elasticsearch/issues/47990

DaveCTurner on 14 Oct 2019

❤3

Ok the change to Elasticsearch is now merged to master and 7.x: We have replaced POST /_cluster/voting_config_exclusions/... with POST /_cluster/voting_config_exclusions?node_names=.... The existing API will be supported throughout the rest of 7.x but will result in deprecation warnings when used in ≥7.8.0.

It will shortly be removed in master but I will hold off on doing that for at least a week from now to give you some time to adapt to the new API without breaking your master builds.

DaveCTurner on 16 Apr 2020

Thanks for the heads up @DaveCTurner!

I suggest we keep this issue open for pre-8.0 clusters (we may decide to do nothing about it though).
And create a new one to track the necessary changes for 8.0.0: https://github.com/elastic/cloud-on-k8s/issues/2951.

sebgl on 24 Apr 2020

I just realized that thanks to https://github.com/elastic/elasticsearch/pull/50836 we could already fix this for Elasticsearch 7.8+, by changing our call from /_cluster/voting_config_exclusions/node1,node2 to /_cluster/voting_config_exclusions? node_names=node1,node2 , which should properly ignore non-existing nodes. @DaveCTurner pointed this out already in his comment above, not sure how we missed it 😞.

Raising priority on this issue.

sebgl on 16 Nov 2020

We have https://github.com/elastic/cloud-on-k8s/issues/2951 for the more focused fix of using the new query parameter

pebrc on 17 Nov 2020

To workaround this situation when running Elasticsearch < 7.8 it's possible to edit the StatefulSet and scale down manually the number of replicas:

Find the StatefulSet:

> kubectl get sts -l elasticsearch.k8s.elastic.co/cluster-name=<cluster-name>
NAME                       READY   AGE
<cluster-name>-es-<nodeset>   m/n     44h

Adjust the number of replicas:

> kubectl scale --replicas=m  sts/<cluster-name>-es-<nodeset>

barkbay on 19 Nov 2020

So are we going to add this workaround to our troubleshooting docs for <7.8 and close this issue?

pebrc on 30 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Cluster temporarily red during v6 -> v7 upgrade

sebgl · 3Comments

Using self-signed certificates for filebeat

SebastianCaceresUltra · 3Comments

Custom elasticsearch configuration option should be available

Sakib37 · 3Comments

Reconciliation is blocked when a Pod can't be created

barkbay · 5Comments

Status subresource updates fail when the crd version changes

sebgl · 3Comments