Etcd: can't get cluster going: etcdserver: publish error: etcdserver: request timed out

Created on 14 Feb 2015 · 17Comments · Source: etcd-io/etcd

I'm trying to start a cluster with:

./etcd -name pi0 -initial-advertise-peer-urls http://10.248.0.1:2380 -listen-peer-urls \
http://localhost:2380,http://localhost:7001,http://10.248.0.1:2380 -listen-client-urls \
http://localhost:2379,http://localhost:4001,http://192.168.1.10:4001 -advertise-client-urls \
http://localhost:2379,http://localhost:4001,http://10.248.0.1:4001 \
-initial-cluster-token pi-etcd \
-initial-cluster pi0=http://10.248.0.1:2380,pi1=http://10.248.0.2:2380,pi2=http://10.248.0.3:2380\
 -initial-cluster-state new

But this keeps failing with:

2015/02/14 12:23:39 etcdserver: publish error: etcdserver: request timed out

I've seen issue #/2276, but this seems again different. I am also not able to capture _any_ network traffic and any interface. So that explain it's timing out, but not sure where to go from here...

Source

miekg

Most helpful comment

'etcdctl cluster-health' reported no problems.

I'm afraid I cannot open another issue because frankly Etcd has become a something of a time-suck for me. As such, I've decided to ditch it. I'll revisit it in a few years when it's had time to mature.

Sent from my iPhone

On Apr 13, 2015, at 9:30 AM, Yicheng Qin [email protected] wrote:

@MrJoy This error indicates the cluster doesn't work in health from the very beginning, and its root cause can be various. Could you open a new issue and list the detailed steps on how to reproduce your case from beginning?

—
Reply to this email directly or view it on GitHub.

MrJoy on 13 Apr 2015

👍8

All 17 comments

@miekg Can you provide more logs of the 3 members you started?

xiang90 on 14 Feb 2015

There all identical, but I'll attach them here.
On 14 Feb 2015 16:44, "Xiang Li" [email protected] wrote:

@miekg https://github.com/miekg Can you provide more logs of the 3
members you started?

—
Reply to this email directly or view it on GitHub
https://github.com/coreos/etcd/issues/2309#issuecomment-74382073.

miekg on 14 Feb 2015

pi0:

./etcd -name pi0 -initial-advertise-peer-urls http://10.248.0.1:2380 -listen-peer-urls http://localhost:2380,http://localhost:7001,http://10.248.0.1:2380 -listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.1.10:4001 -advertise-client-urls http://localhost:2379,http://localhost:4001,http://10.248.0.1:4001 -initial-cluster-token pi-etcd -initial-cluster pi0=http://10.248.0.1:2380,pi1=http://10.248.0.2:2380,pi2=http://10.248.0.3:2380 -initial-cluster-state new
2015/02/14 19:25:17 no data-dir provided, using default data-dir ./pi0.etcd
2015/02/14 19:25:17 etcd: listening for peers on http://10.248.0.1:2380
2015/02/14 19:25:17 etcd: listening for peers on http://localhost:2380
2015/02/14 19:25:17 etcd: listening for peers on http://localhost:7001
2015/02/14 19:25:17 etcd: listening for client requests on http://192.168.1.10:4001
2015/02/14 19:25:17 etcd: listening for client requests on http://localhost:2379
2015/02/14 19:25:17 etcd: listening for client requests on http://localhost:4001
2015/02/14 19:25:17 datadir is valid for the 2.0.1 format
2015/02/14 19:25:17 etcdserver: name = pi0
2015/02/14 19:25:17 etcdserver: data dir = pi0.etcd
2015/02/14 19:25:17 etcdserver: member dir = pi0.etcd/member
2015/02/14 19:25:17 etcdserver: heartbeat = 100ms
2015/02/14 19:25:17 etcdserver: election = 1000ms
2015/02/14 19:25:17 etcdserver: snapshot count = 10000
2015/02/14 19:25:17 etcdserver: advertise client URLs = http://10.248.0.1:4001,http://localhost:2379,http://localhost:4001
2015/02/14 19:25:17 etcdserver: restart member 5a1da7818139bcd6 in cluster de24544665dcb68f at commit index 0
2015/02/14 19:25:17 raft: 5a1da7818139bcd6 became follower at term 0
2015/02/14 19:25:17 raft: newRaft 5a1da7818139bcd6 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 3, lastterm: 1]
2015/02/14 19:25:22 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:25:27 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:25:32 etcdserver: publish error: etcdserver: request timed out
^C2015/02/14 19:25:37 received interrupt signal, shutting down

miekg on 14 Feb 2015

pi1:

./etcd -name pi1 -initial-advertise-peer-urls http://10.248.0.2:2380 -listen-peer-urls http://localhost:2380,http://localhost:7001,http://10.248.0.2:2380 -listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.1.10:4001 -initial-cluster-token pi-etcd -initial-cluster pi0=http://10.248.0.1:2380,pi1=http://10.248.0.2:2380,pi2=http://10.248.0.3:2380 -initial-cluster-state new
2015/02/14 19:25:39 no data-dir provided, using default data-dir ./pi1.etcd
2015/02/14 19:25:39 etcd: listening for peers on http://10.248.0.2:2380
2015/02/14 19:25:39 etcd: listening for peers on http://localhost:2380
2015/02/14 19:25:39 etcd: listening for peers on http://localhost:7001
2015/02/14 19:25:39 etcd: listening for client requests on http://192.168.1.10:4001
2015/02/14 19:25:39 etcd: listening for client requests on http://localhost:2379
2015/02/14 19:25:39 etcd: listening for client requests on http://localhost:4001
2015/02/14 19:25:39 datadir is valid for the 2.0.1 format
2015/02/14 19:25:39 etcdserver: name = pi1
2015/02/14 19:25:39 etcdserver: data dir = pi1.etcd
2015/02/14 19:25:39 etcdserver: member dir = pi1.etcd/member
2015/02/14 19:25:39 etcdserver: heartbeat = 100ms
2015/02/14 19:25:39 etcdserver: election = 1000ms
2015/02/14 19:25:39 etcdserver: snapshot count = 10000
2015/02/14 19:25:39 etcdserver: advertise client URLs = http://localhost:2379,http://localhost:4001
2015/02/14 19:25:39 etcdserver: restart member 45494b64ff9620d3 in cluster 985788d45a3370a0 at commit index 0
2015/02/14 19:25:39 raft: 45494b64ff9620d3 became follower at term 0
2015/02/14 19:25:39 raft: newRaft 45494b64ff9620d3 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 3, lastterm: 1]
2015/02/14 19:25:44 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:25:49 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:25:54 etcdserver: publish error: etcdserver: request timed out
^C2015/02/14 19:25:56 received interrupt signal, shutting down

miekg on 14 Feb 2015

pi2:

./etcd -name pi2 -initial-advertise-peer-urls http://10.248.0.3:2380 -listen-peer-urls http://localhost:2380,http://localhost:7001,http://10.248.0.3:2380 -listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.1.10:4001 -initial-cluster-token pi-etcd -initial-cluster pi0=http://10.248.0.1:2380,pi1=http://10.248.0.2:2380,pi2=http://10.248.0.3:2380 -initial-cluster-state new
2015/02/14 19:25:43 no data-dir provided, using default data-dir ./pi2.etcd
2015/02/14 19:25:43 etcd: listening for peers on http://10.248.0.3:2380
2015/02/14 19:25:43 etcd: listening for peers on http://localhost:2380
2015/02/14 19:25:43 etcd: listening for peers on http://localhost:7001
2015/02/14 19:25:43 etcd: listening for client requests on http://192.168.1.10:4001
2015/02/14 19:25:43 etcd: listening for client requests on http://localhost:2379
2015/02/14 19:25:43 etcd: listening for client requests on http://localhost:4001
2015/02/14 19:25:43 datadir is valid for the 2.0.1 format
2015/02/14 19:25:43 etcdserver: name = pi2
2015/02/14 19:25:43 etcdserver: data dir = pi2.etcd
2015/02/14 19:25:43 etcdserver: member dir = pi2.etcd/member
2015/02/14 19:25:43 etcdserver: heartbeat = 100ms
2015/02/14 19:25:43 etcdserver: election = 1000ms
2015/02/14 19:25:43 etcdserver: snapshot count = 10000
2015/02/14 19:25:43 etcdserver: advertise client URLs = http://localhost:2379,http://localhost:4001
2015/02/14 19:25:43 etcdserver: restart member 49878cdc9ba82919 in cluster 985788d45a3370a0 at commit index 0
2015/02/14 19:25:43 raft: 49878cdc9ba82919 became follower at term 0
2015/02/14 19:25:43 raft: newRaft 49878cdc9ba82919 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 3, lastterm: 1]
2015/02/14 19:25:48 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:25:53 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:25:58 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:03 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:08 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:13 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:18 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:23 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:28 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:33 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:38 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:43 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:48 etcdserver: publish error: etcdserver: request timed out
2015/02/14 19:26:53 etcdserver: publish error: etcdserver: request timed out
^C2015/02/14 19:26:54 received interrupt signal, shutting down

miekg on 14 Feb 2015

@miekg This is wired... I think these are the arm machines restarted from unexpected failures.

xiang90 on 14 Feb 2015

@xiang90 sorry I don't get your comment? The machines are running fine and use a wired 10-network to talk to each other (they also have a wireless interface)

miekg on 14 Feb 2015

@miekg I mean are there etcds the ones on the pi machine? From the log, I see they have failed before.

xiang90 on 14 Feb 2015

That failure would probably be #2308 .

miekg on 14 Feb 2015

@miekg Right... So your cluster actually has not setup properly. We need to solve #2308 first.

xiang90 on 14 Feb 2015

Ack. I want etcd on my arm cluster, so I will poke/fix/tweak until it works.

miekg on 14 Feb 2015

@miekg I am closing this since the root cause is #2308.

xiang90 on 15 Feb 2015

I'm experiencing the same issue trying to add a node to an existing cluster.

I run etcdctl member add, get the env vars, and run this on the CoreOS box of the new node:

rm -rf /opt/etcd/*
/usr/bin/docker run   --net=host   -t -i --rm   -p 2379:2379   -p 2380:2380   -p 4001:4001   -p 7001:7001   -v /opt/etcd:/opt/etcd   -v /usr/share/ca-certificates/:/etc/ssl/certs   quay.io/coreos/etcd:v2.0.8   --data-dir /opt/etcd   --name hivemind-04.${COREOS_PRIVATE_IPV4}     --listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001   --advertise-client-urls=http://${COREOS_PRIVATE_IPV4}:2379,http://${COREOS_PRIVATE_IPV4}:4001 --listen-peer-urls=http://${COREOS_PRIVATE_IPV4}:2380 --initial-advertise-peer-urls=http://${COREOS_PRIVATE_IPV4}:2380 --initial-cluster=$ETCD_INITIAL_CLUSTER --initial-cluster-state=$ETCD_INITIAL_CLUSTER_STATE

Initially, everything seems fine, but eventually this starts happening:

2015/04/11 03:57:42 etcd: listening for peers on http://172.31.25.162:2380
2015/04/11 03:57:42 etcd: listening for client requests on http://0.0.0.0:2379
2015/04/11 03:57:42 etcd: listening for client requests on http://0.0.0.0:4001
2015/04/11 03:57:42 etcdserver: datadir is valid for the 2.0.1 format
2015/04/11 03:57:42 etcdserver: name = hivemind-04.172.31.25.162
2015/04/11 03:57:42 etcdserver: data dir = /opt/etcd
2015/04/11 03:57:42 etcdserver: member dir = /opt/etcd/member
2015/04/11 03:57:42 etcdserver: heartbeat = 100ms
2015/04/11 03:57:42 etcdserver: election = 1000ms
2015/04/11 03:57:42 etcdserver: snapshot count = 10000
2015/04/11 03:57:42 etcdserver: advertise client URLs = http://172.31.25.162:2379,http://172.31.25.162:4001
2015/04/11 03:57:42 etcdserver: start member 5307770575fcee05 in cluster e27fa83f08c7f3d2
2015/04/11 03:57:42 raft: 5307770575fcee05 became follower at term 0
2015/04/11 03:57:42 raft: newRaft 5307770575fcee05 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2015/04/11 03:57:42 raft: 5307770575fcee05 became follower at term 1
2015/04/11 03:57:47 etcdserver: publish error: etcdserver: request timed out
2015/04/11 03:57:52 etcdserver: publish error: etcdserver: request timed out
2015/04/11 03:57:57 etcdserver: publish error: etcdserver: request timed out
2015/04/11 03:58:02 etcdserver: publish error: etcdserver: request timed out
2015/04/11 03:58:07 etcdserver: publish error: etcdserver: request timed out
2015/04/11 03:58:12 etcdserver: publish error: etcdserver: request timed out
2015/04/11 03:58:15 etcdhttp: unexpected error: etcdserver: request timed out
2015/04/11 03:58:16 etcdhttp: unexpected error: etcdserver: request timed out
2015/04/11 03:58:17 etcdserver: publish error: etcdserver: request timed out

MrJoy on 11 Apr 2015

👍1

Oh yes, on one of the servers already in the cluster, I see this:

2015/04/11 03:50:34 etcdserver: added member 5307770575fcee05 [http://172.31.25.162:2379] to cluster e27fa83f08c7f3d2
2015/04/11 03:50:34 sender: error posting to 5307770575fcee05: unexpected http status Not Found while posting to "http://172.31.25.162:2379/raft"
2015/04/11 03:50:34 sender: the connection with 5307770575fcee05 became inactive
2015/04/11 03:50:43 sender: error posting to 5307770575fcee05: dial tcp 172.31.25.162:2379: connection refused
2015/04/11 03:51:19 sender: error posting to 5307770575fcee05: unexpected http status Not Found while posting to "http://172.31.25.162:2379/raft"
2015/04/11 03:53:18 sender: error posting to 5307770575fcee05: read tcp 172.31.25.162:2379: connection reset by peer
2015/04/11 03:53:18 sender: error posting to 5307770575fcee05: dial tcp 172.31.25.162:2379: connection refused
2015/04/11 03:53:18 sender: error posting to 5307770575fcee05: unexpected http status Not Found while posting to "http://172.31.25.162:2379/raft"
2015/04/11 03:54:13 sender: error posting to 5307770575fcee05: dial tcp 172.31.25.162:2379: connection refused
2015/04/11 03:55:10 sender: error posting to 5307770575fcee05: unexpected http status Not Found while posting to "http://172.31.25.162:2379/raft"
2015/04/11 03:57:04 sender: error posting to 5307770575fcee05: dial tcp 172.31.25.162:2379: connection refused

MrJoy on 11 Apr 2015

All the nodes have the same security group configuration so the fact that there are three nodes talking fine suggests it's not a misconfiguration at the Amazon level at least.

MrJoy on 11 Apr 2015

@MrJoy This error indicates the cluster doesn't work in health from the very beginning, and its root cause can be various. Could you open a new issue and list the detailed steps on how to reproduce your case from beginning?

yichengq on 13 Apr 2015

'etcdctl cluster-health' reported no problems.

Sent from my iPhone

On Apr 13, 2015, at 9:30 AM, Yicheng Qin [email protected] wrote:

@MrJoy This error indicates the cluster doesn't work in health from the very beginning, and its root cause can be various. Could you open a new issue and list the detailed steps on how to reproduce your case from beginning?

—
Reply to this email directly or view it on GitHub.

MrJoy on 13 Apr 2015

👍8

Was this page helpful?

0 / 5 - 0 ratings