Etcd: rafthttp: request cluster ID mismatch (got a want b) after usage of ETCD_FORCE_NEW_CLUSTER

Created on 23 Jun 2017  Â·  24Comments  Â·  Source: etcd-io/etcd

Bug reporting

Versions tested: 3.1.x and 3.2.x
Problem: rafthttp: request cluster ID mismatch (got a want b) if a new member will join

Steps:

Running etcd instance without members:

etcdctl member list 8e9e05c52164694d: name=default peerURLs=http://10.x.y.z:2380 clientURLs=http://10.x.y.z:2379 isLeader=true

Preparing to add a new member:

etcdctl member add instance-2 https://10.x.y.zz:2380

Starting etcd with new data-dir:

docker logs 8e631793e4bd8c7e52edf12efd834f24b033b5e1f79f2754dcd426ad113aa745
2017-06-23 18:06:11.603930 I | etcdmain: etcd Version: 3.2.1
2017-06-23 18:06:11.604509 I | etcdmain: Git SHA: 61fc123
2017-06-23 18:06:11.604516 I | etcdmain: Go Version: go1.8.3
2017-06-23 18:06:11.604520 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-23 18:06:11.604523 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-06-23 18:06:11.604617 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = true
2017-06-23 18:06:11.606327 I | embed: listening for peers on https://10.x.y.zz:2380
2017-06-23 18:06:11.606357 W | embed: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2017-06-23 18:06:11.606396 I | embed: listening for client requests on 127.0.0.1:2379
2017-06-23 18:06:11.606427 I | embed: listening for client requests on 10.x.y.zz:2379
2017-06-23 18:06:11.611117 I | etcdserver: name = instance-2
2017-06-23 18:06:11.611143 I | etcdserver: data dir = /var/lib/etcd
2017-06-23 18:06:11.611147 I | etcdserver: member dir = /var/lib/etcd/member
2017-06-23 18:06:11.611151 I | etcdserver: heartbeat = 100ms
2017-06-23 18:06:11.611154 I | etcdserver: election = 1000ms
2017-06-23 18:06:11.611157 I | etcdserver: snapshot count = 100000
2017-06-23 18:06:11.611164 I | etcdserver: advertise client URLs = https://10.x.y.zz:2379
2017-06-23 18:06:11.640699 I | etcdserver: starting member 222f88b64e95262a in cluster cdf818194e3a8c32
2017-06-23 18:06:11.640727 I | raft: 222f88b64e95262a became follower at term 0
2017-06-23 18:06:11.640739 I | raft: newRaft 222f88b64e95262a [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-06-23 18:06:11.640743 I | raft: 222f88b64e95262a became follower at term 1
2017-06-23 18:06:11.658846 W | auth: simple token is not cryptographically signed
2017-06-23 18:06:11.667148 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.667204 I | rafthttp: starting peer 8e9e05c52164694d...
2017-06-23 18:06:11.667224 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.693402 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.693470 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.710553 I | rafthttp: started peer 8e9e05c52164694d
2017-06-23 18:06:11.710579 I | rafthttp: added peer 8e9e05c52164694d
2017-06-23 18:06:11.710608 I | etcdserver: starting server... [version: 3.2.1, cluster version: to_be_decided]
2017-06-23 18:06:11.710623 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = false
2017-06-23 18:06:11.711758 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.711920 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.713513 I | rafthttp: peer 8e9e05c52164694d became active
2017-06-23 18:06:11.713523 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-06-23 18:06:11.713880 I | raft: 222f88b64e95262a [term: 1] received a MsgVote message with higher term from 8e9e05c52164694d [term: 2535]
2017-06-23 18:06:11.713900 I | raft: 222f88b64e95262a became follower at term 2535
2017-06-23 18:06:11.713909 I | raft: 222f88b64e95262a [logterm: 0, index: 0, vote: 0] cast MsgVote for 8e9e05c52164694d [logterm: 2473, index: 707372] at term 2535
2017-06-23 18:06:11.722444 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.722493 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.724351 I | raft: raft.node: 222f88b64e95262a elected leader 8e9e05c52164694d at term 2535
2017-06-23 18:06:11.730153 I | rafthttp: receiving database snapshot [index:707372, from 8e9e05c52164694d] ...
2017-06-23 18:06:11.734416 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-06-23 18:06:11.738337 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.753937 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.754094 I | snap: saved database snapshot to disk [total bytes: 3424256]
2017-06-23 18:06:11.754111 I | rafthttp: received and saved database snapshot [index: 707372, from: 8e9e05c52164694d] successfully
2017-06-23 18:06:11.754193 I | raft: 222f88b64e95262a [commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 707372, term: 2473]
2017-06-23 18:06:11.754211 I | raft: log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 707372, term: 2473]
2017-06-23 18:06:11.754236 I | raft: 222f88b64e95262a restored progress of 222f88b64e95262a [next = 707373, match = 707372, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-06-23 18:06:11.754251 I | raft: 222f88b64e95262a restored progress of 8e9e05c52164694d [next = 707373, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-06-23 18:06:11.754259 I | raft: 222f88b64e95262a [commit: 707372] restored snapshot [index: 707372, term: 2473]
2017-06-23 18:06:11.755157 I | etcdserver: applying snapshot at index 0...
2017-06-23 18:06:11.755190 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.756328 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.757356 I | etcdserver: raft applied incoming snapshot at index 707372
2017-06-23 18:06:11.757918 I | etcdserver: recovering lessor...
2017-06-23 18:06:11.762479 I | etcdserver: finished recovering lessor
2017-06-23 18:06:11.762497 I | etcdserver: restoring mvcc store...
2017-06-23 18:06:11.762522 I | mvcc: restore compact to 251962
2017-06-23 18:06:11.768976 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.771470 I | etcdserver: finished restoring mvcc store
2017-06-23 18:06:11.771491 I | etcdserver: recovering alarms...
2017-06-23 18:06:11.771708 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.773793 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.774824 I | etcdserver: finished recovering alarms
2017-06-23 18:06:11.775077 I | etcdserver: recovering auth store...
2017-06-23 18:06:11.775092 I | etcdserver: finished recovering auth store
2017-06-23 18:06:11.775096 I | etcdserver: recovering store v2...
2017-06-23 18:06:11.776548 I | etcdserver: finished recovering store v2
2017-06-23 18:06:11.776561 I | etcdserver: recovering cluster configuration...
2017-06-23 18:06:11.776627 I | etcdserver/api: enabled capabilities for version 3.1
2017-06-23 18:06:11.776638 I | etcdserver/membership: added member 222f88b64e95262a [https://10.x.y.zz:2380] to cluster cdf818194e3a8c32 from store
2017-06-23 18:06:11.776643 I | etcdserver/membership: added member 8e9e05c52164694d [http://10.x.y.z:2380] to cluster cdf818194e3a8c32 from store
2017-06-23 18:06:11.776648 I | etcdserver/membership: set the cluster version to 3.1 from store
2017-06-23 18:06:11.776651 I | etcdserver: finished recovering cluster configuration
2017-06-23 18:06:11.776654 I | etcdserver: removing old peers from network...
2017-06-23 18:06:11.776659 I | rafthttp: stopping peer 8e9e05c52164694d...
2017-06-23 18:06:11.776804 I | rafthttp: closed the TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-06-23 18:06:11.776812 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.776931 I | rafthttp: closed the TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-06-23 18:06:11.776936 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.777074 I | etcdserver: closing old backend...
2017-06-23 18:06:11.778124 I | rafthttp: stopped HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.778184 W | rafthttp: lost the TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.778221 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.778263 W | rafthttp: lost the TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.778281 E | rafthttp: failed to read 8e9e05c52164694d on stream Message (context canceled)
2017-06-23 18:06:11.778285 I | rafthttp: peer 8e9e05c52164694d became inactive
2017-06-23 18:06:11.778291 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.778296 I | rafthttp: stopped peer 8e9e05c52164694d
2017-06-23 18:06:11.778303 I | rafthttp: removed peer 8e9e05c52164694d
2017-06-23 18:06:11.778307 I | etcdserver: finished removing old peers from network
2017-06-23 18:06:11.778310 I | etcdserver: adding peers from new cluster configuration into network...
2017-06-23 18:06:11.778468 I | rafthttp: starting peer 8e9e05c52164694d...
2017-06-23 18:06:11.778525 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.779020 I | rafthttp: started peer 8e9e05c52164694d
2017-06-23 18:06:11.779041 I | rafthttp: added peer 8e9e05c52164694d
2017-06-23 18:06:11.779045 I | etcdserver: finished adding peers from new cluster configuration into network...
2017-06-23 18:06:11.779052 I | etcdserver: finished applying incoming snapshot at index 0
2017-06-23 18:06:11.779233 I | etcdserver: published {Name:instance-2 ClientURLs:[https://10.x.y.zz:2379]} to cluster cdf818194e3a8c32
2017-06-23 18:06:11.779309 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.779324 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.779344 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.779481 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.779723 I | embed: ready to serve client requests
2017-06-23 18:06:11.780024 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
2017-06-23 18:06:11.780057 I | embed: ready to serve client requests
2017-06-23 18:06:11.780217 I | embed: serving client requests on 10.x.y.zz:2379
2017-06-23 18:06:11.780651 I | etcdserver: finished closing old backend
2017-06-23 18:06:11.783490 I | rafthttp: peer 8e9e05c52164694d became active
2017-06-23 18:06:11.783511 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.783601 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.789290 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.789449 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.802985 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.804787 I | rafthttp: peer 8e9e05c52164694d became active
2017-06-23 18:06:11.804859 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.817990 I | mvcc: store.index: compact 252117
2017-06-23 18:06:11.818618 I | mvcc: finished scheduled compaction at 252117 (took 395.496µs)
2017-06-23 18:06:11.829268 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.829560 N | etcdserver/membership: updated the cluster version from 3.1 to 3.2
2017-06-23 18:06:11.829605 I | etcdserver/api: enabled capabilities for version 3.2
2017-06-23 18:06:11.838191 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.911400 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.911727 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.922308 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-06-23 18:06:11.924200 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.929602 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.930357 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.934752 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.936455 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.937940 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.939887 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-06-23 18:06:11.952105 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.961230 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.964000 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)

This step can be repeated multiple times with the same result, removing member does not help.

Special note: ETCD_FORCE_NEW_CLUSTER was used to get cluster again running. It seems that this seems to corrupt the cluster id in some strange way. Before this command was issued, adding and removing members was no problem.

default:

    - etcd
    - --advertise-client-urls=http://10.x.y.z:2379
    - --data-dir=/var/lib/etcd
    - --listen-client-urls=http://10.x.y.z:2379,http://127.0.0.1:2379
    - --listen-peer-urls=http://10.x.y.z:2380
    - --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --peer-client-cert-auth

instance-2:

    - etcd
    - --advertise-client-urls=https://10.x.y.zz:2379
    - --data-dir=/var/lib/etcd
    - --listen-client-urls=https://10.x.y.zz:2379,http://127.0.0.1:2379
    - --initial-cluster=default=http://10.x.y.z:2380,instance-2=https://10.x.y.zz:2380
    - --initial-cluster-state=existing
    - --name=instance-2
    - --listen-peer-urls=https://10.x.y.zz:2380
    - --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt

For matching the deployment and production guidelines, this use-case must work. Any help would be great how to debug and fix it.

Greets
Manuel

arequestion

Most helpful comment

One of my nodes has an ETCD_INITIAL_CLUSTER with one node less, that's because it was added before the third node, let me stop on this: when a new member Is added, is necessary to reconfigure ETCD_INITIAL_CLUSTER for all active members?.

Anyway, I edited that variable and restarted etcd, obtaining the same result. The difference here is that I don't delete /var/lib/etcd/member folder because it has important cluster data.

All 24 comments

Starting etcd with new data-dir:

How did you start the new member?

Hello gyuho

--initial-cluster-state=existing was provided as flag. I did for the member join everything as stated in the documentation. Normally this works but after the master was started with ETCD_FORCE_NEW_CLUSTER things started to get broken.

when using force-new-cluster flag, the peerUrl must be updated with the real one ;then the second member can be bootstrapped

here is the shell func to update the leader with correct one

function check_and_update_leader {
    while :
    do
        ETCDCTL_API=2 etcdctl member list|grep "isLeader=true"
        if [ $? -ne 0 ]; then
            continue
        fi
        num_of_member=$(ETCDCTL_API=3 etcdctl member list|grep -v "warning"|wc -l)
        leader_member=$(ETCDCTL_API=2 etcdctl member list|grep "isLeader=true")
        if [ ${num_of_member} -gt 1 ]; then
            break
        else
            member=$(echo ${leader_member}|awk -F':' '{print $1}')
            peerURL="the real one"
            ETCDCTL_API=2 etcdctl  -C  ${PEERS} member update ${member} ${peerURL}
            if [ $? -eq 0 ]; then
                break
            fi
        fi
    done
}

Tried it, but same problem:

  • Restart leader etcd with force-new-cluster flag
etcdctl member list
8e9e05c52164694d: name=default peerURLs=http://10.x.y.z:2380 clientURLs=http://x.y.z:2379 isLeader=true
etcdctl member update 8e9e05c52164694d  http://10.x.y.z:2380
etcdctl member add instance-2 https://10.x.y.z:2380

Started new member:

2017-06-24 20:53:06.195784 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)

Some more information that maybe can the cause of this:

I get the same error on the leader as soon as I change http to https. This seems to change something with the cluster id. My idea was to replace the http members with https, until all members are using https. Seems there is something screwed up with the cluster if --listen-peer-urls is changed from http to https.

@ManuelGysin

Please provide the steps to reproduce this issue.

We have no idea what you did exactly. But we are 100% sure simply adding a new member to a existing one member cluster wont cause this as you showed.

Initial:

  • 1 member cluster which serves for kubernetes
  • Added 2 more members

Next step:

  • Want to change HTTP to HTTPS
  • Removed all members with ETCD_FORCE_NEW_CLUSTER
  • Adjusted instance-2 with HTTPS
  • Added member on the leader
  • Getting error with mismatch cluster id

Removed all members with ETCD_FORCE_NEW_CLUSTER

Log with flag set: https://gist.github.com/ManuelGysin/e3cc38d3cc0b53ad729ea22d0d0af0ef
Log after reboot without flag set: https://gist.github.com/ManuelGysin/68d2890b3eee1fd2637a259e99a29e99

member list
8e9e05c52164694d: name=default peerURLs=http://10.135.14.108:2380,https://10.135.14.108:2380 clientURLs=https://10.135.14.108:2379 isLeader=true

Added member on the leader

member add instance-2 https://10.x.y.109:2380
Added member named instance-2 with ID d26014b33c766d61 to cluster

ETCD_NAME="instance-2"
ETCD_INITIAL_CLUSTER="default=http://10.x.y.108:2380,default=https://10.x.y.108:2380,instance-2=https://10.x.y.109:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Getting error with mismatch cluster id

Log: https://gist.github.com/ManuelGysin/f9a87fcda5f0d9b19b653eae7254e668

Some more testing:

If have adjusted the endpoint of instance-2 to http, after that the cluster id mismatch was gone. I'm now not sure what goes wrong. https://coreos.com/etcd/docs/latest/etcd-live-http-to-https-migration.html states that a migration from http to https can be done. Why is the clusterid mismatch thrown if the migration is done?

Removed all members with ETCD_FORCE_NEW_CLUSTER

i do not understand this. how? why not remove members with etcdctl member remove command?

Adjusted instance-2 with HTTPS

how? please show the exact command

Added member on the leader

how?

Getting error with mismatch cluster id

on which node? both of the nodes? or only one of the node? please show logs of all nodes.

why did you specify 2 kind PeerURL for default member ?
default=http://10.x.y.108:2380,default=https://10.x.y.108:2380,instance-2=https://10.x.y.109:2380

Initial cluster with 3 members

etcdctl member list
4f3e0b1262391f8f: name=instance-2 peerURLs=http://10.x.y.110:2380 clientURLs=https://10.x.y.110:2379 isLeader=false
8e9e05c52164694d: name=default peerURLs=http://10.x.y.108:2380 clientURLs=https://10.x.y.108:2379 isLeader=true
a1e3cb0e077aaa08: name=instance-1 peerURLs=http://10.x.y.109:2380 clientURLs=https://10.x.y.109:2379 isLeader=false
etcdctl cluster-health
member 4f3e0b1262391f8f is healthy: got healthy result from https://10.x.y.110:2379
member 8e9e05c52164694d is healthy: got healthy result from https://10.x.y.108:2379
member a1e3cb0e077aaa08 is healthy: got healthy result from https://10.x.y.109:2379
cluster is healthy

Trying to migrate member 3 to https

Old config:

    - etcd
    - --advertise-client-urls=https://10.x.y.110:2379
    - --data-dir=/var/lib/etcd
    - --listen-client-urls=https://10.x.y.110:2379
    - --initial-cluster=instance-2=http://10.x.y.110:2380,default=http://10.x.y.108:2380,instance-1=http://10.x.y.109:2380
    - --initial-cluster-state=existing
    - --name=instance-2
    - --listen-peer-urls=http://10.x.y.110:2380
    - --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --peer-client-cert-auth

New config:

    - etcd
    - --advertise-client-urls=https://10.x.y.110:2379
    - --data-dir=/var/lib/etcd
    - --listen-client-urls=https://10.x.y.110:2379
    - --initial-cluster=instance-2=https://10.x.y.110:2380,default=http://10.x.y.108:2380,instance-1=http://10.x.y.109:2380
    - --initial-cluster-state=existing
    - --name=instance-2
    - --listen-peer-urls=https://10.x.y.110:2380
    - --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --peer-client-cert-auth

Update member with https as peerURL

etcdctl member update 4f3e0b1262391f8f https://10.x.y.110:2380
Updated member with ID 4f3e0b1262391f8f in cluster
etcdctl member list
4f3e0b1262391f8f: name=instance-2 peerURLs=https://10.x.y.110:2380 clientURLs=https://10.x.y.110:2379 isLeader=false
8e9e05c52164694d: name=default peerURLs=http://10.x.y.108:2380 clientURLs=https://10.x.y.108:2379 isLeader=true
a1e3cb0e077aaa08: name=instance-1 peerURLs=http://10.x.y.109:2380 clientURLs=https://10.x.y.109:2379 isLeader=false
etcdctl cluster-health
member 4f3e0b1262391f8f is healthy: got healthy result from https://10.x.y.110:2379
member 8e9e05c52164694d is healthy: got healthy result from https://10.x.y.108:2379
member a1e3cb0e077aaa08 is healthy: got healthy result from https://10.x.y.109:2379

Restarting etcd with new config

etcdctl cluster-health
member 4f3e0b1262391f8f is healthy: got healthy result from https://10.x.y.110:2379
member 8e9e05c52164694d is healthy: got healthy result from https://10.x.y.108:2379
member a1e3cb0e077aaa08 is healthy: got healthy result from https://10.x.y.109:2379

Log Default: https://gist.github.com/ManuelGysin/9e8a0a18cea605f0cdd9ec8797b90d81
Log instance-1: https://gist.github.com/ManuelGysin/6330149609012d640985cad6abc95b3b
Log instance-2: https://gist.github.com/ManuelGysin/52af5184598d10756e5b1d1590144da2

@ManuelGysin

After you update (not delete and re-add) the member, you should not set initial-cluster=existing. Also from the log, I can see this is not a clean cluster. You probably already changed a few stuff that we do not know about.

Can you create a new cluster and try again? You only need to change the listen peer URL and give it key/certs for the updated member.

You mean a completely new cluster with empty data? I can do this, but would prefer some other way. They are a lot of data already stored for the running kubernetes installation. Maybe I find a good way to extract the data and import it again.

btw. it does end in the same result if update or delete and adding are used. In fact the cluster had a hard time, with adding/removing members.

Thanks so far! :)

You mean a completely new cluster with empty data?

just for reproducing this issue purpose. not to suggest you to recreate your production cluster.

if you can reproduce it easily on a clean cluster, so do we. and we can better help you out.

Hello!

Sorry, last week was very busy.
I can reproduce this behavior the following way with a new cluster:

Create a 3 member cluster with HTTP. Update the peer url of member 2 with HTTPS, restart the changed member 2 with HTTPS. After that you will find in the logs of member 2 the following:

rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)

Before change:

- etcd
- --advertise-client-urls=http://10.x.y.z:2379
- --data-dir=/var/lib/etcd
- --listen-client-urls=http://10.x.y.z:2379
- --initial-cluster=wbf-u3282.wbf.admin.ch=http://10.x.y.z:2380,default=http://10.x.y.z:2380
- --initial-cluster-state=existing
- --name=member-2
- --listen-peer-urls=http://10.x.y.z:2380

After change:

    - etcd
    - --advertise-client-urls=https://10.x.y.z:2379
    - --data-dir=/var/lib/etcd
    - --listen-client-urls=https://10.x.y.z:2379
    - --initial-cluster=default=http://10.x.y.z:2380,member-2=https://10.x.y.z:2380
    - --initial-cluster-state=existing
    - --name=member-2
    - --listen-peer-urls=https://10.x.y.z:2380
    - --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd-server.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
    - --peer-client-cert-auth

Greets
Manuel

@ManuelGysin Can you please share us the log of member 2 before and after the restart?

Tasks done:

  1. Start member 1
  2. Update member 1
  3. Add member 2
  4. Start member 2
  5. Add member 3
  6. Start member 3
  7. Stop member 2
  8. Update member 2
  9. Reconfigure member 2
  10. Start member 2

Member 2 before change:

2017-07-03 08:33:04.121448 I | etcdmain: etcd Version: 3.2.1
2017-07-03 08:33:04.122015 I | etcdmain: Git SHA: 61fc123
2017-07-03 08:33:04.122019 I | etcdmain: Go Version: go1.8.3
2017-07-03 08:33:04.122026 I | etcdmain: Go OS/Arch: linux/amd64
2017-07-03 08:33:04.122030 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-07-03 08:33:04.122117 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = true
2017-07-03 08:33:04.122131 W | embed: The scheme of peer url http://10.x.y.109:2380 is HTTP while peer key/cert files are presented. Ignored peer key/cert files.
2017-07-03 08:33:04.122135 W | embed: The scheme of peer url http://10.x.y.109:2380 is HTTP while client cert auth (--peer-client-cert-auth) is enabled. Ignored client cert auth for this url.
2017-07-03 08:33:04.122337 I | embed: listening for peers on http://10.x.y.109:2380
2017-07-03 08:33:04.122363 W | embed: The scheme of client url http://10.x.y.109:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2017-07-03 08:33:04.122403 I | embed: listening for client requests on 10.x.y.109:2379
2017-07-03 08:33:04.127615 I | etcdserver: name = member-2
2017-07-03 08:33:04.127628 I | etcdserver: data dir = /var/lib/etcd
2017-07-03 08:33:04.127633 I | etcdserver: member dir = /var/lib/etcd/member
2017-07-03 08:33:04.127636 I | etcdserver: heartbeat = 100ms
2017-07-03 08:33:04.127639 I | etcdserver: election = 1000ms
2017-07-03 08:33:04.127645 I | etcdserver: snapshot count = 100000
2017-07-03 08:33:04.127665 I | etcdserver: advertise client URLs = http://10.x.y.109:2379
2017-07-03 08:33:04.132055 I | etcdserver: starting member 794925a9c6cc8fb8 in cluster cdf818194e3a8c32
2017-07-03 08:33:04.132086 I | raft: 794925a9c6cc8fb8 became follower at term 0
2017-07-03 08:33:04.132100 I | raft: newRaft 794925a9c6cc8fb8 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-07-03 08:33:04.132109 I | raft: 794925a9c6cc8fb8 became follower at term 1
2017-07-03 08:33:04.138761 W | auth: simple token is not cryptographically signed
2017-07-03 08:33:04.144091 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-07-03 08:33:04.144112 I | rafthttp: starting peer 8e9e05c52164694d...
2017-07-03 08:33:04.144120 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-07-03 08:33:04.144552 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-07-03 08:33:04.145442 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-07-03 08:33:04.146493 I | rafthttp: started peer 8e9e05c52164694d
2017-07-03 08:33:04.146518 I | rafthttp: added peer 8e9e05c52164694d
2017-07-03 08:33:04.146547 I | etcdserver: starting server... [version: 3.2.1, cluster version: to_be_decided]
2017-07-03 08:33:04.146564 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = false
2017-07-03 08:33:04.146578 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-07-03 08:33:04.146619 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream Message reader)
2017-07-03 08:33:04.147290 I | rafthttp: peer 8e9e05c52164694d became active
2017-07-03 08:33:04.147306 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-07-03 08:33:04.147404 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-07-03 08:33:04.148950 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-07-03 08:33:04.149090 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-07-03 08:33:04.529726 I | raft: 794925a9c6cc8fb8 [term: 1] received a MsgVote message with higher term from 8e9e05c52164694d [term: 56]
2017-07-03 08:33:04.529744 I | raft: 794925a9c6cc8fb8 became follower at term 56
2017-07-03 08:33:04.529753 I | raft: 794925a9c6cc8fb8 [logterm: 0, index: 0, vote: 0] cast MsgVote for 8e9e05c52164694d [logterm: 2, index: 7] at term 56
2017-07-03 08:33:04.531503 I | raft: raft.node: 794925a9c6cc8fb8 elected leader 8e9e05c52164694d at term 56
2017-07-03 08:33:04.532573 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2017-07-03 08:33:04.532675 N | etcdserver/membership: set the initial cluster version to 3.2
2017-07-03 08:33:04.532713 I | etcdserver/api: enabled capabilities for version 3.2
2017-07-03 08:33:04.532782 N | etcdserver/membership: updated member 8e9e05c52164694d [http://10.x.y.108:2380] in cluster cdf818194e3a8c32
2017-07-03 08:33:04.532804 I | rafthttp: updated peer 8e9e05c52164694d
2017-07-03 08:33:04.532883 I | etcdserver/membership: added member 794925a9c6cc8fb8 [http://10.x.y.109:2380] to cluster cdf818194e3a8c32
2017-07-03 08:33:04.533437 I | etcdserver: published {Name:member-2 ClientURLs:[http://10.x.y.109:2379]} to cluster cdf818194e3a8c32
2017-07-03 08:33:04.533472 I | embed: ready to serve client requests
2017-07-03 08:33:04.533888 N | embed: serving insecure client requests on 10.x.y.109:2379, this is strongly discouraged!
2017-07-03 08:33:56.579867 I | etcdserver/membership: added member 9e94a72e2b2bc6ef [http://10.x.y.110:2380] to cluster cdf818194e3a8c32
2017-07-03 08:33:56.580600 I | rafthttp: starting peer 9e94a72e2b2bc6ef...
2017-07-03 08:33:56.580614 I | rafthttp: started HTTP pipelining with peer 9e94a72e2b2bc6ef
2017-07-03 08:33:56.585725 I | rafthttp: started peer 9e94a72e2b2bc6ef
2017-07-03 08:33:56.585757 I | rafthttp: added peer 9e94a72e2b2bc6ef
2017-07-03 08:33:56.586330 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (writer)
2017-07-03 08:33:56.586356 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (writer)
2017-07-03 08:33:56.586383 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (stream MsgApp v2 reader)
2017-07-03 08:33:56.586749 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (stream Message reader)
2017-07-03 08:34:01.586995 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:06.587127 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:11.587244 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:16.587393 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:21.587511 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:26.587633 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:31.587768 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:36.587882 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:41.588009 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:46.588135 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:51.588248 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:34:56.588378 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:35:01.588821 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:35:06.588960 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:35:11.589079 W | rafthttp: health check for peer 9e94a72e2b2bc6ef could not connect: dial tcp 10.x.y.110:2380: getsockopt: connection refused
2017-07-03 08:35:15.000573 I | rafthttp: peer 9e94a72e2b2bc6ef became active
2017-07-03 08:35:15.000594 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream MsgApp v2 writer)
2017-07-03 08:35:15.000624 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream Message writer)
2017-07-03 08:35:15.049669 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream MsgApp v2 reader)
2017-07-03 08:35:15.049871 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream Message reader)

Member 2 after change:

2017-07-03 08:40:47.755215 I | etcdmain: etcd Version: 3.2.1
2017-07-03 08:40:47.755807 I | etcdmain: Git SHA: 61fc123
2017-07-03 08:40:47.755817 I | etcdmain: Go Version: go1.8.3
2017-07-03 08:40:47.755820 I | etcdmain: Go OS/Arch: linux/amd64
2017-07-03 08:40:47.755824 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-07-03 08:40:47.755900 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-07-03 08:40:47.755929 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = true
2017-07-03 08:40:47.757332 I | embed: listening for peers on https://10.x.y.109:2380
2017-07-03 08:40:47.757404 I | embed: listening for client requests on 10.x.y.109:2379
2017-07-03 08:40:47.772312 I | etcdserver: name = member-2
2017-07-03 08:40:47.772328 I | etcdserver: data dir = /var/lib/etcd
2017-07-03 08:40:47.772332 I | etcdserver: member dir = /var/lib/etcd/member
2017-07-03 08:40:47.772336 I | etcdserver: heartbeat = 100ms
2017-07-03 08:40:47.772338 I | etcdserver: election = 1000ms
2017-07-03 08:40:47.772341 I | etcdserver: snapshot count = 100000
2017-07-03 08:40:47.772349 I | etcdserver: advertise client URLs = https://10.x.y.109:2379
2017-07-03 08:40:47.773547 I | etcdserver: restarting member 794925a9c6cc8fb8 in cluster cdf818194e3a8c32 at commit index 11
2017-07-03 08:40:47.773582 I | raft: 794925a9c6cc8fb8 became follower at term 56
2017-07-03 08:40:47.773595 I | raft: newRaft 794925a9c6cc8fb8 [peers: [], term: 56, commit: 11, applied: 0, lastindex: 11, lastterm: 56]
2017-07-03 08:40:47.778232 W | auth: simple token is not cryptographically signed
2017-07-03 08:40:47.782829 I | etcdserver: starting server... [version: 3.2.1, cluster version: to_be_decided]
2017-07-03 08:40:47.782862 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = false
2017-07-03 08:40:47.783470 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2017-07-03 08:40:47.783497 I | rafthttp: starting peer 8e9e05c52164694d...
2017-07-03 08:40:47.783524 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-07-03 08:40:47.783749 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-07-03 08:40:47.784756 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-07-03 08:40:47.786364 I | rafthttp: started peer 8e9e05c52164694d
2017-07-03 08:40:47.786402 I | rafthttp: added peer 8e9e05c52164694d
2017-07-03 08:40:47.786522 N | etcdserver/membership: set the initial cluster version to 3.2
2017-07-03 08:40:47.786565 I | etcdserver/api: enabled capabilities for version 3.2
2017-07-03 08:40:47.786643 N | etcdserver/membership: updated member 8e9e05c52164694d [http://10.x.y.108:2380] in cluster cdf818194e3a8c32
2017-07-03 08:40:47.786664 I | rafthttp: updated peer 8e9e05c52164694d
2017-07-03 08:40:47.786720 I | etcdserver/membership: added member 794925a9c6cc8fb8 [http://10.x.y.109:2380] to cluster cdf818194e3a8c32
2017-07-03 08:40:47.786797 I | etcdserver/membership: added member 9e94a72e2b2bc6ef [http://10.x.y.110:2380] to cluster cdf818194e3a8c32
2017-07-03 08:40:47.786811 I | rafthttp: starting peer 9e94a72e2b2bc6ef...
2017-07-03 08:40:47.786835 I | rafthttp: started HTTP pipelining with peer 9e94a72e2b2bc6ef
2017-07-03 08:40:47.790085 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (writer)
2017-07-03 08:40:47.790110 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-07-03 08:40:47.790234 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream Message reader)
2017-07-03 08:40:47.790516 I | rafthttp: started peer 9e94a72e2b2bc6ef
2017-07-03 08:40:47.790533 I | rafthttp: added peer 9e94a72e2b2bc6ef
2017-07-03 08:40:47.790577 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (writer)
2017-07-03 08:40:47.790608 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (stream MsgApp v2 reader)
2017-07-03 08:40:47.790743 I | rafthttp: started streaming with peer 9e94a72e2b2bc6ef (stream Message reader)
2017-07-03 08:40:47.790935 I | rafthttp: peer 8e9e05c52164694d became active
2017-07-03 08:40:47.790949 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-07-03 08:40:47.791244 I | rafthttp: peer 9e94a72e2b2bc6ef became active
2017-07-03 08:40:47.791256 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream MsgApp v2 reader)
2017-07-03 08:40:47.791289 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-07-03 08:40:47.791304 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream Message reader)
2017-07-03 08:40:47.829134 I | raft: raft.node: 794925a9c6cc8fb8 elected leader 8e9e05c52164694d at term 56
2017-07-03 08:40:47.831652 N | etcdserver/membership: updated member 794925a9c6cc8fb8 [https://10.x.y.109:2380] in cluster cdf818194e3a8c32
2017-07-03 08:40:47.833221 I | etcdserver: published {Name:member-2 ClientURLs:[https://10.x.y.109:2379]} to cluster cdf818194e3a8c32
2017-07-03 08:40:47.834315 I | embed: ready to serve client requests
2017-07-03 08:40:47.834622 I | embed: serving client requests on 10.x.y.109:2379
2017-07-03 08:40:47.854706 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-07-03 08:40:47.867795 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-07-03 08:40:47.952610 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream MsgApp v2 writer)
2017-07-03 08:40:47.956551 I | rafthttp: established a TCP streaming connection with peer 9e94a72e2b2bc6ef (stream Message writer)
2017-07-03 08:40:47.963416 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-07-03 08:40:47.963531 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)

2017-07-03 08:40:47.783470 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2017-07-03 08:40:47.783497 I | rafthttp: starting peer 8e9e05c52164694d...
2017-07-03 08:40:47.783524 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-07-03 08:40:47.783749 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-07-03 08:40:47.784756 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-07-03 08:40:47.786364 I | rafthttp: started peer 8e9e05c52164694d
2017-07-03 08:40:47.786402 I | rafthttp: added peer 8e9e05c52164694d
2017-07-03 08:40:47.786522 N | etcdserver/membership: set the initial cluster version to 3.2
2017-07-03 08:40:47.786565 I | etcdserver/api: enabled capabilities for version 3.2
2017-07-03 08:40:47.786643 N | etcdserver/membership: updated member 8e9e05c52164694d [http://10.x.y.108:2380] in cluster cdf818194e3a8c32
2017-07-03 08:40:47.786664 I | rafthttp: updated peer 8e9e05c52164694d

It seems there are other actions you have taken besides what you described. Please provide us the exact steps to reproduce this issue. Thank you.

My --data-dir=/var/etcd/data, remove and recreate it, that works for me. It seems that something of previous etcd cluster I made left in this directory, which may affect the etcd settings.

Hi, I've used ETCD_FORCE_NEW_CLUSTER in a 2.3.8 version of etcd using a backup data from one node of an "almost corrupted 5 node cluster", and this guide (https://coreos.com/etcd/docs/latest/v2/admin_guide.html#disaster-recovery) was useful. The key part on this is the update of peer url with etcdctl member update before adding new members to resize the cluster.

My cluster now has 3 members and is healthy, but, on the 2 new members I see repeatedly on their logs:
request cluster ID mismatch (got 31f9388b9980093e want 5f787c242a02)

Maybe a remanent of the old configuration?, I reused those 2 members from the old cluster wiping out /var/lib/etcd directory, changing node name and reusing ip:port

in my case i got the error

rafthttp: request cluster ID mismatch (got 1b3a88599e79f82b want b33939d80a381a57)

due to incorrect config on one node

two my nodes got in config

env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380,etcd-02=http://172.16.50.102:2380,etcd-03=http://172.16.50.103:2380"

and one node got

env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380"

to resolve the problem i stopped etcd on all nodes, edited incorrect config, deleted /var/lib/etcd/member folder in all nodes , restarted etcd on all nodes and voila !

p.s.

/var/lib/etcd - is the folder where etcd save its data in my case

One of my nodes has an ETCD_INITIAL_CLUSTER with one node less, that's because it was added before the third node, let me stop on this: when a new member Is added, is necessary to reconfigure ETCD_INITIAL_CLUSTER for all active members?.

Anyway, I edited that variable and restarted etcd, obtaining the same result. The difference here is that I don't delete /var/lib/etcd/member folder because it has important cluster data.

It seems there are other actions you have taken besides what you described. Please provide us the exact steps to reproduce this issue. Thank you.

As Xiang mentioned, reproducible steps must be provided for us debugging.

And for other issues, please create a separate issue with reproducible steps.

I was able to reproduce this by accidentally issuing the initial-cluster commands a second time on one node.

Was this page helpful?
0 / 5 - 0 ratings