etcd warning with Kubernetes

Created on 23 Apr 2020 · 9Comments · Source: etcd-io/etcd

Version

Kubernetes

Kubernetes v1.18.2

Kubeadm:

kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:54:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Cluster:
1x master
2x worker-nodes
kubectl:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

OS

CentOS 7

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

etcd

etcd 3.4.3

What happened?

I have been struggling with an etcd warning triggered when I try to reset my kubernetes (version 1.18) cluster as you see in the following markup:

[root@master ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "master" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
{"level":"warn","ts":"2020-04-23T16:37:29.913+0200","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-d62d7670-0929-417d-9fa7-9e72f59ad0e0/192.168.10.100:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}

This warning is printed out 10-15 times and then the cluster can finally reset itself afterwards. I did not experience this warning when working with version 1.16.

What you expected to happen?

The cluster is reset without etcd warnings

How to reproduce it (as minimally and precisely as possible)?

kubeadm init

kubeadm reset

stale

Source

UgurTheG

Most helpful comment

@neolit123 The default logging level for etcd client is INFO. I see two options:

set the log level to something higher than WARN when creating etcd client. I did not find a specific flag for this, so it would be something like https://github.com/kubernetes/kubernetes/pull/90164

change the logging level of this particular message to below INFO (i.e. DEBUG).

Option 1 is probably the right way to do it, unless people feel it is wrong to log this particular message at WARN level. If we do not want to vendor zap into k/k, maybe we can add a flag to configure logging level for etcd client.

cc @gyuho

jingyih on 24 Apr 2020

👍2

All 9 comments

thanks for logging the issue @UgurTheG

to add some detail we started seeing these non-verbose warnings when kubeadm performs operations using the etcd client we bundle with k8s / kubeadm 1.18:

for example:

> I0423 18:34:18.934385    3883 etcd.go:327] Failed to remove etcd member: etcdserver: re-configuration failed due to not enough started members
{"level":"warn","ts":"2020-04-23T18:34:25.534+0300","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-335d5851-7f90-4dce-8eed-77acc5a459ae/192.168.0.102:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}

https://github.com/kubernetes/kubernetes/blob/413c81a7933c9aba1435fec5fb650b4fd71f2d97/cmd/kubeadm/app/util/etcd/etcd.go#L303

etcd 3.3.11

i think this is the actual etcd client vendor for 1.18:

go.etcd.io/etcd => go.etcd.io/etcd v0.0.0-20191023171146-3cf2f69b5738 // 3cf2f69b5738 is the SHA for git tag v3.4.3

is there a way to silence these warnings by default?
we had a proposal PR that vendors zap in k/k and passes a nop-zap config to the etcd client, but ideally they should be silenced by default:

https://github.com/kubernetes/kubernetes/pull/90164

cc @jpbetz @SataQiu

neolit123 on 23 Apr 2020

👍1

Thanks for answering @neolit123

Yes, I read my version wrong. I do have the 3.4.3 version of etcd. Unfortunately I have not found a way to silence these warnings myself.

UgurTheG on 23 Apr 2020

etcd config has a StrictReconfigCheck flag, It is true by default,it will reject removal if leads to quorum loss and return ErrNotEnoughStartedMembers to clients.

func (s *EtcdServer) RemoveMember(ctx context.Context, id uint64) ([]*membership.Member, error) {
    if err := s.checkMembershipOperationPermission(ctx); err != nil {
        return nil, err
    }

    // by default StrictReconfigCheck is enabled; reject removal if leads to quorum loss
    if err := s.mayRemoveMember(types.ID(id)); err != nil {
        return nil, err
    }

    cc := raftpb.ConfChange{
        Type:   raftpb.ConfChangeRemoveNode,
        NodeID: id,
    }
    return s.configure(ctx, cc)
}

    if !s.cluster.IsReadyToRemoveVotingMember(uint64(id)) {
        if lg := s.getLogger(); lg != nil {
            lg.Warn(
                "rejecting member remove request; not enough healthy members",
                zap.String("local-member-id", s.ID().String()),
                zap.String("requested-member-remove-id", id.String()),
                zap.Error(ErrNotEnoughStartedMembers),
            )
        } else {
            plog.Warningf("not enough started members, rejecting remove member %s", id)
        }
        return ErrNotEnoughStartedMembers
    }

https://github.com/etcd-io/etcd/blob/release-3.4/etcdserver/server.go#L1800:22

tangcong on 24 Apr 2020

@neolit123 @UgurTheG

tangcong on 24 Apr 2020

@neolit123 The default logging level for etcd client is INFO. I see two options:

set the log level to something higher than WARN when creating etcd client. I did not find a specific flag for this, so it would be something like https://github.com/kubernetes/kubernetes/pull/90164

change the logging level of this particular message to below INFO (i.e. DEBUG).

cc @gyuho

jingyih on 24 Apr 2020

👍2

thank you for the explanation, @tangcong and @jingyih
it's unclear if the k/k dependency maintainers will be convinced that zap should now be vendored.
maybe it's going to be included in the future, regardless of the kubeadm case.

making it possible to configure only etcd client verbosity level with a field in the etcd client makes sense to me, yet if zap.Config is the preferred way in etcd then i guess we have to go that route.

neolit123 on 24 Apr 2020

@neolit123 @tangcong @jingyih I think I found the answer to my problem. I had not have a daemon.json file for docker in /etc/docker/. So therefore I created that daemon.json file and restarted docker and kubelet. There are no warnings anymore when reseting cluster with kubeadm.

UgurTheG on 18 Jun 2020

for kubeadm 1.19 we added some logic to not try to remove the last member in the cluster, which was the primary trigger for the warnings.

the potential configuration of log level per etcd client instance seems still useful.

neolit123 on 18 Jun 2020

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.