Etcd: Cannot setup 3 node etcd cluster

Created on 8 Jan 2018  Â·  19Comments  Â·  Source: etcd-io/etcd

cat /etc/os-release

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Machines:

k8s-master-01 - 192.168.232.100
k8s-master-02 - 192.168.232.101
k8s-master-03 - 192.168.232.102

etcd --version

etcd Version: 3.2.9
Git SHA: f1d7dd8
Go Version: go1.8.3
Go OS/Arch: linux/amd64

How to reproduce:

[All machines]
yum install etcd

[k8s-master-01]:

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.100:2380 \
  --listen-peer-urls http://192.168.232.100:2380 \
  --listen-client-urls http://192.168.232.100:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.232.100:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380 \
  --initial-cluster-state new
  --auto-tls \
  --peer-auto-tls

[k8s-master-02]:

[root@k8s-master-02 etcd]# etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.101:2380   --listen-peer-urls http://192.168.232.101:2380   --listen-client-urls http://192.168.232.101:2379,http://127.0.0.1:2379   --advertise-client-urls http://192.168.232.101:2379   --initial-cluster-token etcd-cluster-1   --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380   --initial-cluster-state new --auto-tls --peer-auto-tls
2018-01-08 11:54:12.705581 I | etcdmain: etcd Version: 3.2.9
2018-01-08 11:54:12.705639 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 11:54:12.705644 I | etcdmain: Go Version: go1.8.3
2018-01-08 11:54:12.705649 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 11:54:12.705655 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 11:54:12.705665 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 11:54:12.705707 W | etcdmain: found invalid file/dir fixtures under data dir infra0.etcd (Ignore this if you are upgrading etcd)
2018-01-08 11:54:12.705718 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-01-08 11:54:12.705758 I | embed: peerTLS: cert = infra0.etcd/fixtures/peer/cert.pem, key = infra0.etcd/fixtures/peer/key.pem, ca = , trusted-ca = , client-cert-auth = false
2018-01-08 11:54:12.705768 W | embed: The scheme of peer url http://192.168.232.101:2380 is HTTP while peer key/cert files are presented. Ignored peer key/cert files.
2018-01-08 11:54:12.705820 I | embed: listening for peers on http://192.168.232.101:2380
2018-01-08 11:54:12.705851 W | embed: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2018-01-08 11:54:12.705892 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 11:54:12.705904 W | embed: The scheme of client url http://192.168.232.101:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2018-01-08 11:54:12.705943 I | embed: listening for client requests on 192.168.232.101:2379
2018-01-08 11:54:12.733618 I | etcdmain: --initial-cluster must include infra0=http://192.168.232.101:2380 given --initial-advertise-peer-urls=http://192.168.232.101:2380
[root@k8s-master-02 etcd]# 

[k8s-master-03]:

[root@k8s-master-03 ~]# etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.102:2380   --listen-peer-urls http://192.168.232.102:2380   --listen-client-urls http://192.168.232.102:2379,http://127.0.0.1:2379   --advertise-client-urls http://192.168.232.102:2379   --initial-cluster-token etcd-cluster-1   --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380   --initial-cluster-state new
2018-01-08 11:46:56.713062 I | etcdmain: etcd Version: 3.2.9
2018-01-08 11:46:56.713131 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 11:46:56.713137 I | etcdmain: Go Version: go1.8.3
2018-01-08 11:46:56.713142 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 11:46:56.713147 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 11:46:56.713164 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 11:46:56.713501 I | embed: listening for peers on http://192.168.232.102:2380
2018-01-08 11:46:56.713566 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 11:46:56.713600 I | embed: listening for client requests on 192.168.232.102:2379
2018-01-08 11:46:56.753553 I | etcdmain: --initial-cluster must include infra0=http://192.168.232.102:2380 given --initial-advertise-peer-urls=http://192.168.232.102:2380
[root@k8s-master-03 ~]# 

[root@k8s-master-01 etcd]# etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.100:2380   --listen-peer-urls http://192.168.232.100:2380   --listen-client-urls http://192.168.232.100:2379,http://127.0.0.1:2379   --advertise-client-urls http://192.168.232.100:2379   --initial-cluster-token etcd-cluster-1   --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380   --initial-cluster-state new
2018-01-08 11:46:53.877388 I | etcdmain: etcd Version: 3.2.9
2018-01-08 11:46:53.877461 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 11:46:53.877467 I | etcdmain: Go Version: go1.8.3
2018-01-08 11:46:53.877475 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 11:46:53.877480 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 11:46:53.877491 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 11:46:53.877549 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-01-08 11:46:53.877619 I | embed: listening for peers on http://192.168.232.100:2380
2018-01-08 11:46:53.877665 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 11:46:53.877705 I | embed: listening for client requests on 192.168.232.100:2379
2018-01-08 11:46:53.880290 I | etcdserver: name = infra0
2018-01-08 11:46:53.880300 I | etcdserver: data dir = infra0.etcd
2018-01-08 11:46:53.880305 I | etcdserver: member dir = infra0.etcd/member
2018-01-08 11:46:53.880310 I | etcdserver: heartbeat = 100ms
2018-01-08 11:46:53.880315 I | etcdserver: election = 1000ms
2018-01-08 11:46:53.880319 I | etcdserver: snapshot count = 100000
2018-01-08 11:46:53.880331 I | etcdserver: advertise client URLs = http://192.168.232.100:2379
2018-01-08 11:46:53.880802 I | etcdserver: restarting member 1a0f423a850b33 in cluster ddec615d236f5865 at commit index 3
2018-01-08 11:46:53.880838 I | raft: 1a0f423a850b33 became follower at term 245
2018-01-08 11:46:53.880854 I | raft: newRaft 1a0f423a850b33 [peers: [], term: 245, commit: 3, applied: 0, lastindex: 3, lastterm: 1]
2018-01-08 11:46:53.883679 W | auth: simple token is not cryptographically signed
2018-01-08 11:46:53.884744 I | etcdserver: starting server... [version: 3.2.9, cluster version: to_be_decided]
2018-01-08 11:46:53.885612 I | etcdserver/membership: added member 1a0f423a850b33 [http://192.168.232.100:2380] to cluster ddec615d236f5865
2018-01-08 11:46:53.885702 I | etcdserver/membership: added member 87f5c922a6a67302 [http://192.168.232.101:2380] to cluster ddec615d236f5865
2018-01-08 11:46:53.885722 I | rafthttp: starting peer 87f5c922a6a67302...
2018-01-08 11:46:53.885753 I | rafthttp: started HTTP pipelining with peer 87f5c922a6a67302
2018-01-08 11:46:53.887630 I | rafthttp: started streaming with peer 87f5c922a6a67302 (writer)
2018-01-08 11:46:53.887700 I | rafthttp: started streaming with peer 87f5c922a6a67302 (writer)
2018-01-08 11:46:53.888769 I | rafthttp: started peer 87f5c922a6a67302
2018-01-08 11:46:53.888808 I | rafthttp: added peer 87f5c922a6a67302
2018-01-08 11:46:53.888835 I | rafthttp: started streaming with peer 87f5c922a6a67302 (stream MsgApp v2 reader)
2018-01-08 11:46:53.888902 I | etcdserver/membership: added member c6b41ba06674f9bd [http://192.168.232.102:2380] to cluster ddec615d236f5865
2018-01-08 11:46:53.888918 I | rafthttp: starting peer c6b41ba06674f9bd...
2018-01-08 11:46:53.888932 I | rafthttp: started HTTP pipelining with peer c6b41ba06674f9bd
2018-01-08 11:46:53.889189 I | rafthttp: started streaming with peer 87f5c922a6a67302 (stream Message reader)
2018-01-08 11:46:53.889476 I | rafthttp: started streaming with peer c6b41ba06674f9bd (writer)
2018-01-08 11:46:53.891582 I | rafthttp: started peer c6b41ba06674f9bd
2018-01-08 11:46:53.891609 I | rafthttp: added peer c6b41ba06674f9bd
2018-01-08 11:46:53.891975 I | rafthttp: started streaming with peer c6b41ba06674f9bd (writer)
2018-01-08 11:46:53.891996 I | rafthttp: started streaming with peer c6b41ba06674f9bd (stream MsgApp v2 reader)
2018-01-08 11:46:53.892013 I | rafthttp: started streaming with peer c6b41ba06674f9bd (stream Message reader)
2018-01-08 11:46:54.381147 I | raft: 1a0f423a850b33 is starting a new election at term 245
2018-01-08 11:46:54.381215 I | raft: 1a0f423a850b33 became candidate at term 246
2018-01-08 11:46:54.381243 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 246
2018-01-08 11:46:54.381255 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 246
2018-01-08 11:46:54.381265 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 246
2018-01-08 11:46:55.981149 I | raft: 1a0f423a850b33 is starting a new election at term 246
2018-01-08 11:46:55.981181 I | raft: 1a0f423a850b33 became candidate at term 247
2018-01-08 11:46:55.981192 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 247
2018-01-08 11:46:55.981203 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 247
2018-01-08 11:46:55.981212 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 247
2018-01-08 11:46:57.281119 I | raft: 1a0f423a850b33 is starting a new election at term 247
2018-01-08 11:46:57.281156 I | raft: 1a0f423a850b33 became candidate at term 248
2018-01-08 11:46:57.281168 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 248
2018-01-08 11:46:57.281178 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 248
2018-01-08 11:46:57.281190 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 248
2018-01-08 11:46:58.889273 W | rafthttp: health check for peer 87f5c922a6a67302 could not connect: dial tcp 192.168.232.101:2380: getsockopt: connection refused
2018-01-08 11:46:58.892059 W | rafthttp: health check for peer c6b41ba06674f9bd could not connect: dial tcp 192.168.232.102:2380: getsockopt: connection refused
2018-01-08 11:46:59.181119 I | raft: 1a0f423a850b33 is starting a new election at term 248
2018-01-08 11:46:59.181150 I | raft: 1a0f423a850b33 became candidate at term 249
2018-01-08 11:46:59.181162 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 249
2018-01-08 11:46:59.181174 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 249
2018-01-08 11:46:59.181184 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 249
2018-01-08 11:47:00.381130 I | raft: 1a0f423a850b33 is starting a new election at term 249
2018-01-08 11:47:00.381174 I | raft: 1a0f423a850b33 became candidate at term 250
2018-01-08 11:47:00.381187 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 250
2018-01-08 11:47:00.381199 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 250
2018-01-08 11:47:00.381208 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 250
2018-01-08 11:47:00.885407 E | etcdserver: publish error: etcdserver: request timed out
2018-01-08 11:47:02.281133 I | raft: 1a0f423a850b33 is starting a new election at term 250
2018-01-08 11:47:02.281195 I | raft: 1a0f423a850b33 became candidate at term 251
2018-01-08 11:47:02.281211 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 251
2018-01-08 11:47:02.281227 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 251
2018-01-08 11:47:02.281238 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 251

etcdctl member gets connection refused.

Or I end up with

etcd.service main process exited code=exited status=203/exec

After I do

systemctl start etcd

Independently from which tutorial I follow I cant get working cluster. Tried following Kelsey Hightower's The Hard Way, also failed; followed several different tuts and debugged things - also no results. I'm fighting with etcd for like a 5-6 days, nothings helps at all. So far bare-metal 9-node HA with kubernetes seems impossible.

Either it hangs on /up/ or I got fail error
How to correctly setup a 3 node etcd cluster?

All 19 comments

2018-01-08 11:54:12.705851 W | embed: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
--listen-peer-urls http://192.168.232.100:2380

Hi @aryadrottning looks like your tryng to use TLS with the HTTP vs HTTPS schema for starters.

@hexfusion Whats the workaround for this? I simply followed the guide.

@aryadrottning please post link to guide you are following so we can fix if it is incorrect. But I would take a quick look at our Documentation . And look at what might be different from what you are passing to etcd.

@hexfusion Well it was the documentation you posted, just changed the IPs/adresses. Tried changing certificates (from auto to pointing /etc/etcd/pki etc.) but didn't succeed as well.

Before I followed Kelsey Hightower's THW (https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/07-bootstrapping-etcd.md) with correct amendments due to VM infrastructure (bare-metal -> ESX -> VMs).

@hexfusion Well it was the documentation you posted, just changed the IPs/adresses.

@aryadrottning you also changed something else. Please note your example usage of http vs documentation use of https .

Perhaps it might make sense to start with a basic static cluster as you seem to already know the endpoint IP's. setup static cluster Once you get that going then add auth layers such as Peer TLS etc.

Let's check

Origin from document you linked before:

$ etcd --name infra0 --initial-advertise-peer-urls https://10.0.1.10:2380 \
  --listen-peer-urls https://10.0.1.10:2380 \
  --listen-client-urls https://10.0.1.10:2379,https://127.0.0.1:2379 \
  --advertise-client-urls https://10.0.1.10:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=https://10.0.1.10:2380,infra1=https://10.0.1.11:2380,infra2=https://10.0.1.12:2380 \
  --initial-cluster-state new \
  --auto-tls \
  --peer-auto-tls

Mine:

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.100:2380 \
  --listen-peer-urls http://192.168.232.100:2380 \
  --listen-client-urls http://192.168.232.100:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.232.100:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380 \
  --initial-cluster-state new
  --auto-tls \
  --peer-auto-tls

Second machine:

Example:

etcd --name infra1 --initial-advertise-peer-urls https://10.0.1.11:2380 \
  --listen-peer-urls https://10.0.1.11:2380 \
  --listen-client-urls https://10.0.1.11:2379,https://127.0.0.1:2379 \
  --advertise-client-urls https://10.0.1.11:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=https://10.0.1.10:2380,infra1=https://10.0.1.11:2380,infra2=https://10.0.1.12:2380 \
  --initial-cluster-state new \
  --auto-tls \
  --peer-auto-tls

Mine:

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.101:2380 \
  --listen-peer-urls http://192.168.232.101:2380 \
  --listen-client-urls http://192.168.232.101:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.232.101:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380 \
  --initial-cluster-state new \ 
  --auto-tls \
  --peer-auto-tls

Third machine:

Example:

etcd --name infra2 --initial-advertise-peer-urls https://10.0.1.12:2380 \
  --listen-peer-urls https://10.0.1.12:2380 \
  --listen-client-urls https://10.0.1.12:2379,https://127.0.0.1:2379 \
  --advertise-client-urls https://10.0.1.12:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=https://10.0.1.10:2380,infra1=https://10.0.1.11:2380,infra2=https://10.0.1.12:2380 \
  --initial-cluster-state new \
  --auto-tls \
  --peer-auto-tls

Mine:

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.102:2380 \
  --listen-peer-urls http://192.168.232.102:2380 \
  --listen-client-urls http://192.168.232.102:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.232.102:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380 \
  --initial-cluster-state new
  --auto-tls \
  --peer-auto-tls

Not sure how should etcd behave - 1st node should run and wait for other members to join and AFTER at least one joins it will stop spamming logs? I thought about making etcd run without any certs/https just for test purposes.

According to your second comment one thing is not clear for me. Second code table with:

--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new

Mentioned before

ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380" ETCD_INITIAL_CLUSTER_STATE=new

Means that it's for comment purposes or I have to run this along with the first command that declares envs?

Not sure how should etcd behave - 1st node should run and wait for other members to join and AFTER at least one joins it will stop spamming logs? I thought about making etcd run without any certs/https just for test purposes.

@aryadrottning then you need to drop these flags if your not using TLS. You are telling etcd to use TLS then using http schema where is needs to be https.

--auto-tls \
--peer-auto-tls

In each of the examples you are setting the endpoints as http where our docs are clearly using https as they are using auto-tls flag which means the communication to the etcd cluster must happen over HTTPS.

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.102:2380
--listen-peer-urls http://192.168.232.102:2380
--listen-client-urls http://192.168.232.102:2379,http://127.0.0.1:2379
--advertise-client-urls http://192.168.232.102:2379
--initial-cluster-token etcd-cluster-1
--initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380
--initial-cluster-state new
--auto-tls
--peer-auto-tls

@aryadrottning regarding your last question.

Configuration parameters can be passed to to etcd by flag or ENV var. For a reference to this please review https://github.com/coreos/etcd/blob/master/Documentation/op-guide/configuration.md. I am closing this as the issue appears to be configuration, please feel free to ask questions on this ticket or open a new issue if you run into additional trouble. Thanks!

@hexfusion Sorry for late response - finished work and had to come back home and connect to workplace.

So far as suggested I deleted auto-tls and peer-auto-tls also everything is being executed with HTTP protocol.

Now this happens:

bootstrap-etcd on machine 1 with ip 192.168.232.100

#!/bin/bash

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.100:2380 \
  --listen-peer-urls http://192.168.232.100:2380 \
  --listen-client-urls http://192.168.232.100:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.232.100:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380 \
  --initial-cluster-state new

bootrap-etc on machine 2 with ip 192.168.232.101

#!/bin/bash

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.101:2380
  --listen-peer-urls http://192.168.232.101:2380
  --listen-client-urls http://192.168.232.101:2379,http://127.0.0.1:2379
  --advertise-client-urls http://192.168.232.101:2379
  --initial-cluster-token etcd-cluster-1
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380
  --initial-cluster-state new

I got this on machine 1

[root@k8s-master-01 ~]# ./bootstrap-etcd 
2018-01-08 22:08:40.658492 I | etcdmain: etcd Version: 3.2.9
2018-01-08 22:08:40.658550 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 22:08:40.658556 I | etcdmain: Go Version: go1.8.3
2018-01-08 22:08:40.658561 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 22:08:40.658567 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 22:08:40.658578 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 22:08:40.658749 I | embed: listening for peers on http://192.168.232.100:2380
2018-01-08 22:08:40.658820 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 22:08:40.658870 I | embed: listening for client requests on 192.168.232.100:2379
2018-01-08 22:08:40.663494 I | etcdserver: name = infra0
2018-01-08 22:08:40.663509 I | etcdserver: data dir = infra0.etcd
2018-01-08 22:08:40.663515 I | etcdserver: member dir = infra0.etcd/member
2018-01-08 22:08:40.663520 I | etcdserver: heartbeat = 100ms
2018-01-08 22:08:40.663525 I | etcdserver: election = 1000ms
2018-01-08 22:08:40.663529 I | etcdserver: snapshot count = 100000
2018-01-08 22:08:40.663541 I | etcdserver: advertise client URLs = http://192.168.232.100:2379
2018-01-08 22:08:40.663547 I | etcdserver: initial advertise peer URLs = http://192.168.232.100:2380
2018-01-08 22:08:40.663560 I | etcdserver: initial cluster = infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380
2018-01-08 22:08:40.667335 I | etcdserver: starting member 1a0f423a850b33 in cluster ddec615d236f5865
2018-01-08 22:08:40.667369 I | raft: 1a0f423a850b33 became follower at term 0
2018-01-08 22:08:40.667385 I | raft: newRaft 1a0f423a850b33 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-01-08 22:08:40.667392 I | raft: 1a0f423a850b33 became follower at term 1
2018-01-08 22:08:40.707612 W | auth: simple token is not cryptographically signed
2018-01-08 22:08:40.708818 I | rafthttp: starting peer 87f5c922a6a67302...
2018-01-08 22:08:40.708857 I | rafthttp: started HTTP pipelining with peer 87f5c922a6a67302
2018-01-08 22:08:40.709131 I | rafthttp: started streaming with peer 87f5c922a6a67302 (writer)
2018-01-08 22:08:40.709288 I | rafthttp: started streaming with peer 87f5c922a6a67302 (writer)
2018-01-08 22:08:40.711731 I | rafthttp: started peer 87f5c922a6a67302
2018-01-08 22:08:40.711756 I | rafthttp: added peer 87f5c922a6a67302
2018-01-08 22:08:40.711768 I | rafthttp: started streaming with peer 87f5c922a6a67302 (stream MsgApp v2 reader)
2018-01-08 22:08:40.711826 I | rafthttp: started streaming with peer 87f5c922a6a67302 (stream Message reader)
2018-01-08 22:08:40.711967 I | rafthttp: starting peer c6b41ba06674f9bd...
2018-01-08 22:08:40.711996 I | rafthttp: started HTTP pipelining with peer c6b41ba06674f9bd
2018-01-08 22:08:40.712271 I | rafthttp: started streaming with peer c6b41ba06674f9bd (writer)
2018-01-08 22:08:40.714403 I | rafthttp: started peer c6b41ba06674f9bd
2018-01-08 22:08:40.714453 I | rafthttp: added peer c6b41ba06674f9bd
2018-01-08 22:08:40.714677 I | etcdserver: starting server... [version: 3.2.9, cluster version: to_be_decided]
2018-01-08 22:08:40.714823 I | rafthttp: started streaming with peer c6b41ba06674f9bd (writer)
2018-01-08 22:08:40.714856 I | rafthttp: started streaming with peer c6b41ba06674f9bd (stream MsgApp v2 reader)
2018-01-08 22:08:40.715050 I | rafthttp: started streaming with peer c6b41ba06674f9bd (stream Message reader)
2018-01-08 22:08:40.716516 I | etcdserver/membership: added member 1a0f423a850b33 [http://192.168.232.100:2380] to cluster ddec615d236f5865
2018-01-08 22:08:40.716628 I | etcdserver/membership: added member 87f5c922a6a67302 [http://192.168.232.101:2380] to cluster ddec615d236f5865
2018-01-08 22:08:40.716718 I | etcdserver/membership: added member c6b41ba06674f9bd [http://192.168.232.102:2380] to cluster ddec615d236f5865
2018-01-08 22:08:41.167620 I | raft: 1a0f423a850b33 is starting a new election at term 1
2018-01-08 22:08:41.167705 I | raft: 1a0f423a850b33 became candidate at term 2
2018-01-08 22:08:41.167737 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 2
2018-01-08 22:08:41.167748 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 2
2018-01-08 22:08:41.167758 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 2
2018-01-08 22:08:42.667607 I | raft: 1a0f423a850b33 is starting a new election at term 2
2018-01-08 22:08:42.667646 I | raft: 1a0f423a850b33 became candidate at term 3
2018-01-08 22:08:42.667658 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 3
2018-01-08 22:08:42.667670 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 3
2018-01-08 22:08:42.667680 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 3
2018-01-08 22:08:44.167616 I | raft: 1a0f423a850b33 is starting a new election at term 3
2018-01-08 22:08:44.167671 I | raft: 1a0f423a850b33 became candidate at term 4
2018-01-08 22:08:44.167685 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 4
2018-01-08 22:08:44.167696 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 4
2018-01-08 22:08:44.167711 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 4
2018-01-08 22:08:45.267651 I | raft: 1a0f423a850b33 is starting a new election at term 4
2018-01-08 22:08:45.267683 I | raft: 1a0f423a850b33 became candidate at term 5
2018-01-08 22:08:45.267705 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 5
2018-01-08 22:08:45.267718 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 5
2018-01-08 22:08:45.267728 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 5
2018-01-08 22:08:45.711918 W | rafthttp: health check for peer 87f5c922a6a67302 could not connect: dial tcp 192.168.232.101:2380: getsockopt: connection refused
2018-01-08 22:08:45.715455 W | rafthttp: health check for peer c6b41ba06674f9bd could not connect: dial tcp 192.168.232.102:2380: getsockopt: connection refused
2018-01-08 22:08:46.667641 I | raft: 1a0f423a850b33 is starting a new election at term 5
2018-01-08 22:08:46.667674 I | raft: 1a0f423a850b33 became candidate at term 6
2018-01-08 22:08:46.667688 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 6
2018-01-08 22:08:46.667715 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 6
2018-01-08 22:08:46.667731 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 6
2018-01-08 22:08:47.716287 E | etcdserver: publish error: etcdserver: request timed out
2018-01-08 22:08:48.367641 I | raft: 1a0f423a850b33 is starting a new election at term 6
2018-01-08 22:08:48.367673 I | raft: 1a0f423a850b33 became candidate at term 7
2018-01-08 22:08:48.367684 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 7
2018-01-08 22:08:48.367706 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 7
2018-01-08 22:08:48.367716 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 7
2018-01-08 22:08:49.967641 I | raft: 1a0f423a850b33 is starting a new election at term 7
2018-01-08 22:08:49.967681 I | raft: 1a0f423a850b33 became candidate at term 8
2018-01-08 22:08:49.967692 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 8
2018-01-08 22:08:49.967703 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 8
2018-01-08 22:08:49.967725 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 8
2018-01-08 22:08:50.712098 W | rafthttp: health check for peer 87f5c922a6a67302 could not connect: dial tcp 192.168.232.101:2380: getsockopt: connection refused
2018-01-08 22:08:50.715586 W | rafthttp: health check for peer c6b41ba06674f9bd could not connect: dial tcp 192.168.232.102:2380: getsockopt: connection refused
2018-01-08 22:08:51.667629 I | raft: 1a0f423a850b33 is starting a new election at term 8
2018-01-08 22:08:51.667668 I | raft: 1a0f423a850b33 became candidate at term 9
2018-01-08 22:08:51.667693 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 9
2018-01-08 22:08:51.667707 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 9
2018-01-08 22:08:51.667717 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 9
2018-01-08 22:08:52.867646 I | raft: 1a0f423a850b33 is starting a new election at term 9
2018-01-08 22:08:52.867679 I | raft: 1a0f423a850b33 became candidate at term 10
2018-01-08 22:08:52.867690 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 10
2018-01-08 22:08:52.867702 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 10
2018-01-08 22:08:52.867713 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 10
2018-01-08 22:08:54.367644 I | raft: 1a0f423a850b33 is starting a new election at term 10
2018-01-08 22:08:54.367679 I | raft: 1a0f423a850b33 became candidate at term 11
2018-01-08 22:08:54.367689 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 11
2018-01-08 22:08:54.367707 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 11
2018-01-08 22:08:54.367718 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 11
2018-01-08 22:08:54.716421 E | etcdserver: publish error: etcdserver: request timed out
2018-01-08 22:08:55.567636 I | raft: 1a0f423a850b33 is starting a new election at term 11
2018-01-08 22:08:55.567670 I | raft: 1a0f423a850b33 became candidate at term 12
2018-01-08 22:08:55.567681 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 12
2018-01-08 22:08:55.567692 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 12
2018-01-08 22:08:55.567702 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 12
2018-01-08 22:08:55.712223 W | rafthttp: health check for peer 87f5c922a6a67302 could not connect: dial tcp 192.168.232.101:2380: getsockopt: connection refused
2018-01-08 22:08:55.715704 W | rafthttp: health check for peer c6b41ba06674f9bd could not connect: dial tcp 192.168.232.102:2380: getsockopt: connection refused
2018-01-08 22:08:57.267621 I | raft: 1a0f423a850b33 is starting a new election at term 12
2018-01-08 22:08:57.267654 I | raft: 1a0f423a850b33 became candidate at term 13
2018-01-08 22:08:57.267665 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 13
2018-01-08 22:08:57.267675 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 13
2018-01-08 22:08:57.267685 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 13
2018-01-08 22:08:58.267636 I | raft: 1a0f423a850b33 is starting a new election at term 13
2018-01-08 22:08:58.267669 I | raft: 1a0f423a850b33 became candidate at term 14
2018-01-08 22:08:58.267680 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 14
2018-01-08 22:08:58.267698 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 14
2018-01-08 22:08:58.267712 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 14

`

And machine 2 hangs at:

`[root@k8s-master-02 ~]# ./bootstrap-etcd 
2018-01-08 22:08:53.168019 I | etcdmain: etcd Version: 3.2.9
2018-01-08 22:08:53.168119 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 22:08:53.168126 I | etcdmain: Go Version: go1.8.3
2018-01-08 22:08:53.168130 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 22:08:53.168136 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 22:08:53.168151 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 22:08:53.168202 W | etcdmain: found invalid file/dir fixtures under data dir infra0.etcd (Ignore this if you are upgrading etcd)
2018-01-08 22:08:53.168214 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-01-08 22:08:53.168900 I | embed: listening for peers on http://localhost:2380
2018-01-08 22:08:53.169114 I | embed: listening for client requests on localhost:2379
2018-01-08 22:08:53.172431 I | etcdserver: name = infra0
2018-01-08 22:08:53.172443 I | etcdserver: data dir = infra0.etcd
2018-01-08 22:08:53.172449 I | etcdserver: member dir = infra0.etcd/member
2018-01-08 22:08:53.172454 I | etcdserver: heartbeat = 100ms
2018-01-08 22:08:53.172458 I | etcdserver: election = 1000ms
2018-01-08 22:08:53.172462 I | etcdserver: snapshot count = 100000
2018-01-08 22:08:53.172478 I | etcdserver: advertise client URLs = http://localhost:2379
2018-01-08 22:08:53.174081 I | etcdserver: restarting member 2e685fb023450ac0 in cluster ca7d8136dfc19e03 at commit index 34
2018-01-08 22:08:53.174132 I | raft: 2e685fb023450ac0 became follower at term 17
2018-01-08 22:08:53.174147 I | raft: newRaft 2e685fb023450ac0 [peers: [], term: 17, commit: 34, applied: 0, lastindex: 34, lastterm: 17]
2018-01-08 22:08:53.179138 W | auth: simple token is not cryptographically signed
2018-01-08 22:08:53.180209 I | etcdserver: starting server... [version: 3.2.9, cluster version: to_be_decided]
2018-01-08 22:08:53.181711 I | etcdserver/membership: added member 2e685fb023450ac0 [https://192.168.232.101:2380] to cluster ca7d8136dfc19e03
2018-01-08 22:08:53.181848 N | etcdserver/membership: set the initial cluster version to 3.2
2018-01-08 22:08:53.181908 I | etcdserver/api: enabled capabilities for version 3.2
2018-01-08 22:08:53.874445 I | raft: 2e685fb023450ac0 is starting a new election at term 17
2018-01-08 22:08:53.874538 I | raft: 2e685fb023450ac0 became candidate at term 18
2018-01-08 22:08:53.874565 I | raft: 2e685fb023450ac0 received MsgVoteResp from 2e685fb023450ac0 at term 18
2018-01-08 22:08:53.874583 I | raft: 2e685fb023450ac0 became leader at term 18
2018-01-08 22:08:53.874594 I | raft: raft.node: 2e685fb023450ac0 elected leader 2e685fb023450ac0 at term 18
2018-01-08 22:08:53.875401 I | embed: ready to serve client requests
2018-01-08 22:08:53.875722 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-01-08 22:08:53.875789 I | etcdserver: published {Name:infra0 ClientURLs:[http://localhost:2379]} to cluster ca7d8136dfc19e03
2018-01-08 22:08:53.876022 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
^C2018-01-08 22:08:59.222160 N | pkg/osutil: received interrupt signal, shutting down...
2018-01-08 22:08:59.222800 I | etcdserver: skipped leadership transfer for single member cluster
2018-01-08 22:08:59.223213 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::1]:2379: operation was canceled"; Reconnecting to {localhost:2379 <nil>}
2018-01-08 22:08:59.223229 I | etcdserver/api/v3rpc: Failed to dial localhost:2379: grpc: the connection is closing; please retry.

Maybe providing data dir would help?

Maybe providing data dir would help?

Well you can see it is writing to the default data-dir $name.etcd.

2018-01-08 22:08:40.658578 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd

lets nuke those for each machine as I am not confident what is going on there. and if you would like to define a --data-dir feel free to do so.

@aryadrottning machine 2 logs shows this.

2018-01-08 22:08:53.168900 I | embed: listening for peers on http://localhost:2380
2018-01-08 22:08:53.169114 I | embed: listening for client requests on localhost:2379

these are default values http://localhost:2380 meaning they were not passed to etcd when it started unless you used localhost?. Thus etcd fell back to them.

Lets do this

  • nuke all data dirs.
  • double check your configs for all nodes carefully.
  • bootstrap cluster again. If node N fails again please post the full logs vs snippet.

Thank you.

Also the name needs to match the name of the node. I see --name infra0 used twice. I have made these kinds of mistakes before, an old German friend calls it copy and waste :). It is important that the configs are walked through one by one to make sure they are correct. I think we are close now :)

@hexfusion Yea, I had to carefully check things. Purged data dirs, checked configs. Seems okay now but now I was getting this:

2018-01-09 15:29:44.303369 E | rafthttp: request cluster ID mismatch (got ddec615d236f5865 want 2ae4f248e48a8e31)

So I had to remove data dir which was changed to /opt/etcd so I might easily delete stuff after faulty start and try again.

Then I started 2 member cluster, couldn't add 3rd member after deleting with etcdctl member remove; Changed config on 3rd member, provided data dir, erased all files generated before, checked config, manually added member with etcdctl on 1st member - now it works!

Thanks a lot for a patience and advices. Now my cluster is healthy! :1st_place_medal:

Thanks a lot for a patience and advices. Now my cluster is healthy! 🥇

@aryadrottning very good, glad to help!

i have the same probleme can somone help please :

postgres@cmvd0204:/data/pgsql/etcd $ etcd --config-file /data/pgsql/etcd/etcd.conf &
[1] 51354
postgres@cmvd0204:/data/pgsql/etcd $ 2018-12-10 14:14:44.755501 I | etcdmain: Loading server configuration from "/data/pgsql/etcd/etcd.conf"
2018-12-10 14:14:44.756553 I | etcdmain: etcd Version: 3.3.2
2018-12-10 14:14:44.756564 I | etcdmain: Git SHA: c9d46ab37
2018-12-10 14:14:44.756569 I | etcdmain: Go Version: go1.9.4
2018-12-10 14:14:44.756573 I | etcdmain: Go OS/Arch: linux/amd64
2018-12-10 14:14:44.756579 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-12-10 14:14:44.756588 W | etcdmain: no data-dir provided, using default data-dir ./cmvd0204.etcd
2018-12-10 14:14:44.756857 I | embed: listening for peers on http://10.147.146.230:2380
2018-12-10 14:14:44.756909 I | embed: listening for client requests on 10.147.146.230:2379
2018-12-10 14:14:44.756950 I | embed: listening for client requests on 127.0.0.1:2379
2018-12-10 14:14:44.760598 I | etcdserver: name = cmvd0204
2018-12-10 14:14:44.760611 I | etcdserver: data dir = cmvd0204.etcd
2018-12-10 14:14:44.760616 I | etcdserver: member dir = cmvd0204.etcd/member
2018-12-10 14:14:44.760624 I | etcdserver: heartbeat = 100ms
2018-12-10 14:14:44.760628 I | etcdserver: election = 1000ms
2018-12-10 14:14:44.760632 I | etcdserver: snapshot count = 100000
2018-12-10 14:14:44.760661 I | etcdserver: advertise client URLs = http://10.147.146.230:2379,http://127.0.0.1:2379
2018-12-10 14:14:44.760667 I | etcdserver: initial advertise peer URLs = http://10.147.146.230:2380
2018-12-10 14:14:44.760677 I | etcdserver: initial cluster = cmvd0203=http://10.147.146.219:2380, cmvi0202=http://10.147.150.86:2380,cmvd0204=http://10.147.146.230:2380
2018-12-10 14:14:44.762781 I | etcdserver: starting member b2a4d5d31c0e8e24 in cluster 24ff8b6e9ba8f6b1
2018-12-10 14:14:44.762813 I | raft: b2a4d5d31c0e8e24 became follower at term 0
2018-12-10 14:14:44.762826 I | raft: newRaft b2a4d5d31c0e8e24 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-12-10 14:14:44.762831 I | raft: b2a4d5d31c0e8e24 became follower at term 1
2018-12-10 14:14:44.768825 W | auth: simple token is not cryptographically signed
2018-12-10 14:14:44.773742 I | rafthttp: starting peer 7c955fd1917aa00a...
2018-12-10 14:14:44.773782 I | rafthttp: started HTTP pipelining with peer 7c955fd1917aa00a
2018-12-10 14:14:44.774330 I | rafthttp: started streaming with peer 7c955fd1917aa00a (writer)
2018-12-10 14:14:44.774477 I | rafthttp: started streaming with peer 7c955fd1917aa00a (writer)
2018-12-10 14:14:44.775346 I | rafthttp: started peer 7c955fd1917aa00a
2018-12-10 14:14:44.775367 I | rafthttp: added peer 7c955fd1917aa00a
2018-12-10 14:14:44.775378 I | rafthttp: starting peer e631367a69bdc338...
2018-12-10 14:14:44.775389 I | rafthttp: started HTTP pipelining with peer e631367a69bdc338
2018-12-10 14:14:44.775884 I | rafthttp: started streaming with peer 7c955fd1917aa00a (stream MsgApp v2 reader)
2018-12-10 14:14:44.776178 I | rafthttp: started streaming with peer 7c955fd1917aa00a (stream Message reader)
2018-12-10 14:14:44.776939 I | rafthttp: started streaming with peer e631367a69bdc338 (writer)
2018-12-10 14:14:44.777697 I | rafthttp: started streaming with peer e631367a69bdc338 (writer)
2018-12-10 14:14:44.778102 I | rafthttp: started peer e631367a69bdc338
2018-12-10 14:14:44.778125 I | rafthttp: added peer e631367a69bdc338
2018-12-10 14:14:44.778135 I | rafthttp: started streaming with peer e631367a69bdc338 (stream MsgApp v2 reader)
2018-12-10 14:14:44.778160 I | etcdserver: starting server... [version: 3.3.2, cluster version: to_be_decided]
2018-12-10 14:14:44.778365 I | rafthttp: started streaming with peer e631367a69bdc338 (stream Message reader)
2018-12-10 14:14:44.780511 I | etcdserver/membership: added member 7c955fd1917aa00a [http://10.147.150.86:2380] to cluster 24ff8b6e9ba8f6b1
2018-12-10 14:14:44.780664 I | etcdserver/membership: added member b2a4d5d31c0e8e24 [http://10.147.146.230:2380] to cluster 24ff8b6e9ba8f6b1
2018-12-10 14:14:44.780788 I | etcdserver/membership: added member e631367a69bdc338 [http://10.147.146.219:2380] to cluster 24ff8b6e9ba8f6b1
2018-12-10 14:14:45.563093 I | raft: b2a4d5d31c0e8e24 is starting a new election at term 1
2018-12-10 14:14:45.563128 I | raft: b2a4d5d31c0e8e24 became candidate at term 2
2018-12-10 14:14:45.563147 I | raft: b2a4d5d31c0e8e24 received MsgVoteResp from b2a4d5d31c0e8e24 at term 2
2018-12-10 14:14:45.563158 I | raft: b2a4d5d31c0e8e24 [logterm: 1, index: 3] sent MsgVote request to 7c955fd1917aa00a at term 2
2018-12-10 14:14:45.563167 I | raft: b2a4d5d31c0e8e24 [logterm: 1, index: 3] sent MsgVote request to e631367a69bdc338 at term 2
2018-12-10 14:14:47.163090 I | raft: b2a4d5d31c0e8e24 is starting a new election at term 2
2018-12-10 14:14:54.040130 I | raft: b2a4d5d31c0e8e24 became candidate at term 3
2018-12-10 14:14:54.040156 I | raft: b2a4d5d31c0e8e24 received MsgVoteResp from b2a4d5d31c0e8e24 at term 3
2018-12-10 14:14:54.040169 I | raft: b2a4d5d31c0e8e24 [logterm: 1, index: 3] sent MsgVote request to 7c955fd1917aa00a at term 3
2018-12-10 14:14:54.040177 I | raft: b2a4d5d31c0e8e24 [logterm: 1, index: 3] sent MsgVote request to e631367a69bdc338 at term 3
2018-12-10 14:14:54.040223 I | raft: b2a4d5d31c0e8e24 is starting a new election at term 3
2018-12-10 14:14:54.040231 I | raft: b2a4d5d31c0e8e24 became candidate at term 4
2018-12-10 14:14:54.040237 I | raft: b2a4d5d31c0e8e24 received MsgVoteResp from b2a4d5d31c0e8e24 at term 4
2018-12-10 14:14:54.040244 I | raft: b2a4d5d31c0e8e24 [logterm: 1, index: 3] sent MsgVote request to 7c955fd1917aa00a at term 4
2018-12-10 14:14:54.040250 W | rafthttp: health check for peer 7c955fd1917aa00a could not connect: dial tcp 10.147.150.86:2380: getsockopt: connection refused
2018-12-10 14:14:54.040291 W | rafthttp: health check for peer e631367a69bdc338 could not connect: dial tcp 10.147.146.219:2380: getsockopt: connection refused
2018-12-10 14:14:54.040303 E | etcdserver: publish error: etcdserver: request timed out

//////////////

2018-12-10 14:14:54.040250 W | rafthttp: health check for peer 7c955fd1917aa00a could not connect: dial tcp 10.147.150.86:2380: getsockopt: connection refused
2018-12-10 14:14:54.040291 W | rafthttp: health check for peer e631367a69bdc338 could not connect: dial tcp 10.147.146.219:2380: getsockopt: connection refused

2018-12-10 14:14:54.040250 W | rafthttp: health check for peer 7c955fd1917aa00a could not connect: dial tcp 10.147.150.86:2380: getsockopt: connection refused

@slimane1992 the error is literal it can not connect possibly because of networking issues, firewall settings etc. What do the logs look like on the other nodes?

ETCD_LISTEN_PEER_URLS="http://127.0.0.1:2380" Your only listening local
right? Netstat would tell you that. What happens if you listen to
0.0.0.0:2380 and same for client?

On Sun, Dec 23, 2018 at 2:10 AM JackAndJoker notifications@github.com
wrote:

* Hi everyone, I have also been struggling to debug a problem about etcd
for several weeks.
*

I was trying to make a etcd cluster with 2 nodes(docker1 with IP addr
10.240.10.10, and docker2 with IP addr 10.240.10.11). Their network
connection is ok. And I have the configs of /etc/etcd/etcd.conf:

docker1:

[Member]

ETCD_CORS=""

ETCD_DATA_DIR="/var/lib/etcd/default.etcd"

ETCD_WAL_DIR=""

ETCD_LISTEN_PEER_URLS="http://127.0.0.1:2380"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379"

ETCD_MAX_SNAPSHOTS="5"

ETCD_MAX_WALS="5"

ETCD_NAME=etcd1

ETCD_SNAPSHOT_COUNT="100000"

ETCD_HEARTBEAT_INTERVAL="10"

ETCD_ELECTION_TIMEOUT="1000"

ETCD_QUOTA_BACKEND_BYTES="0"

ETCD_MAX_REQUEST_BYTES="1572864"

ETCD_GRPC_KEEPALIVE_MIN_TIME="5s"

ETCD_GRPC_KEEPALIVE_INTERVAL="2h0m0s"

ETCD_GRPC_KEEPALIVE_TIMEOUT="20s"

[Clustering]

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://docker1:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://docker1:2379"

ETCD_DISCOVERY=""

ETCD_DISCOVERY_FALLBACK="proxy"

ETCD_DISCOVERY_PROXY=""

ETCD_DISCOVERY_SRV=""

ETCD_INITIAL_CLUSTER="etcd1=http://docker1:2380,etcd2=http://docker2:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster1"
ETCD_INITIAL_CLUSTER_STATE="new"

ETCD_STRICT_RECONFIG_CHECK="true"

ETCD_ENABLE_V2="true"

docker2:

[Member]

ETCD_CORS=""

ETCD_DATA_DIR="/var/lib/etcd/default.etcd"

ETCD_WAL_DIR=""

ETCD_LISTEN_PEER_URLS="http://127.0.0.1:2380"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379"

ETCD_MAX_SNAPSHOTS="5"

ETCD_MAX_WALS="5"

ETCD_NAME=etcd2

ETCD_SNAPSHOT_COUNT="100000"

ETCD_HEARTBEAT_INTERVAL="10"

ETCD_ELECTION_TIMEOUT="1000"

ETCD_QUOTA_BACKEND_BYTES="0"

ETCD_MAX_REQUEST_BYTES="1572864"

ETCD_GRPC_KEEPALIVE_MIN_TIME="5s"

ETCD_GRPC_KEEPALIVE_INTERVAL="2h0m0s"

ETCD_GRPC_KEEPALIVE_TIMEOUT="20s"

[Clustering]

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://docker2:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://docker2:2379"

ETCD_DISCOVERY=""

ETCD_DISCOVERY_FALLBACK="proxy"

ETCD_DISCOVERY_PROXY=""

ETCD_DISCOVERY_SRV=""

ETCD_INITIAL_CLUSTER="etcd1=http://docker1:2380,etcd2=http://docker2:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster1"

ETCD_INITIAL_CLUSTER_STATE="existing"

ETCD_STRICT_RECONFIG_CHECK="true"

ETCD_ENABLE_V2="true"

And other items and values keep default. When I try to start my etcd
nodes everytime, node docker1 work fine but on my node docker2 I would got
these error messages shown below:

Dec 23 14:28:43 docker2 etcd[7451]: e66fdb54279bdeea is starting a new
election at term 5491
Dec 23 14:28:43 docker2 etcd[7451]: e66fdb54279bdeea became candidate at
term 5492
Dec 23 14:28:43 docker2 etcd[7451]: e66fdb54279bdeea received MsgVoteResp
from e66fdb54279bdeea at term 5492
Dec 23 14:28:43 docker2 etcd[7451]: e66fdb54279bdeea [logterm: 1, index:
2] sent MsgVote request to 6153d5753c9f3dca at term 5492
Dec 23 14:28:44 docker2 etcd[7451]: e66fdb54279bdeea is starting a new
election at term 5492
Dec 23 14:28:44 docker2 etcd[7451]: e66fdb54279bdeea became candidate at
term 5493
Dec 23 14:28:44 docker2 etcd[7451]: e66fdb54279bdeea received MsgVoteResp
from e66fdb54279bdeea at term 5493
Dec 23 14:28:44 docker2 etcd[7451]: e66fdb54279bdeea [logterm: 1, index:
2] sent MsgVote request to 6153d5753c9f3dca at term 5493
Dec 23 14:28:45 docker2 etcd[7451]: e66fdb54279bdeea is starting a new
election at term 5493
Dec 23 14:28:45 docker2 etcd[7451]: e66fdb54279bdeea became candidate at
term 5494
Dec 23 14:28:45 docker2 etcd[7451]: e66fdb54279bdeea received MsgVoteResp
from e66fdb54279bdeea at term 5494
Dec 23 14:28:45 docker2 etcd[7451]: e66fdb54279bdeea [logterm: 1, index:
2] sent MsgVote request to 6153d5753c9f3dca at term 5494
Dec 23 14:28:47 docker2 etcd[7451]: e66fdb54279bdeea is starting a new
election at term 5494
Dec 23 14:28:47 docker2 etcd[7451]: e66fdb54279bdeea became candidate at
term 5495
Dec 23 14:28:47 docker2 etcd[7451]: e66fdb54279bdeea received MsgVoteResp
from e66fdb54279bdeea at term 5495
Dec 23 14:28:47 docker2 etcd[7451]: e66fdb54279bdeea [logterm: 1, index:
2] sent MsgVote request to 6153d5753c9f3dca at term 5495
Dec 23 14:28:47 docker2 etcd[7451]: health check for peer 6153d5753c9f3dca
could not connect: dial tcp 10.240.10.10:2380: getsockopt: connection
refused
Dec 23 14:28:48 docker2 etcd[7451]: publish error: etcdserver: request
timed out

I use tcpdump tool to capture the negotiation packets with 'tcpdump -i
eno33554960 src host 10.240.10.10 && src port 2380'. It prints:

15:02:17.588387 IP docker1.2380 > 10.240.10.11.56032: Flags [R.], seq 0,
ack 1895752983, win 0, length 0
15:02:17.588506 IP docker1.2380 > 10.240.10.11.56033: Flags [R.], seq 0,
ack 3457340419, win 0, length 0
15:02:17.689535 IP docker1.2380 > 10.240.10.11.56034: Flags [R.], seq 0,
ack 2722242393, win 0, length 0
15:02:17.689626 IP docker1.2380 > 10.240.10.11.56035: Flags [R.], seq 0,
ack 1232465604, win 0, length 0
15:02:17.790540 IP docker1.2380 > 10.240.10.11.56036: Flags [R.], seq 0,
ack 1745748630, win 0, length 0
15:02:17.790658 IP docker1.2380 > 10.240.10.11.56037: Flags [R.], seq 0,
ack 2679875336, win 0, length 0
15:02:17.891675 IP docker1.2380 > 10.240.10.11.56038: Flags [R.], seq 0,
ack 3059153456, win 0, length 0
15:02:17.891793 IP docker1.2380 > 10.240.10.11.56039: Flags [R.], seq 0,
ack 1156909527, win 0, length 0
15:02:17.993955 IP docker1.2380 > 10.240.10.11.56040: Flags [R.], seq 0,
ack 2051040837, win 0, length 0

And I use 'netstat -ntulp | grep 2380' to check the ports on every node.
They are in state of LISTEN. I have stopped and disabled firewalld services
before. Could you please help me troubleshoot this problem? Thank you.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/etcd-io/etcd/issues/9114#issuecomment-449618869, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABMR1URuEWPjkB1_1-w8L1QyBW5Pnrhkks5u7yxJgaJpZM4RWOTj
.

>

SAM BATSCHELET

SOFTWARE ENGINEER

Red Hat

https://www.redhat.com/

[email protected]
https://red.ht/sig
TRIED. TESTED. TRUSTED. https://redhat.com/trusted

Make sure to run bootstrapping command on all master nodes, I was stuck at the same issue got resolved by enabling etc service on all master nodes.

Was this page helpful?
0 / 5 - 0 ratings