Hi,
I have 3 server nodes deployed in binary way.
And 2 client nodes deployed in docker container.
initially, the 2 client nodes join to the cluster successfully.
but after that, the server complain the "memberlist: Push/Pull" error and keep marking the client nodes failed and then rejoin them to the cluster over and over again.
logs from the server:
2016/04/20 12:20:58 [ERR] memberlist: Push/Pull with k8s2consul failed: dial tcp 172.17.0.5:8301: i/o timeout
2016/04/20 12:22:11 [INFO] memberlist: Suspect k8s2consul has failed, no acks received
2016/04/20 12:22:16 [INFO] memberlist: Marking k8s2consul as failed, suspect timeout reached
2016/04/20 12:22:16 [INFO] serf: EventMemberFailed: k8s2consul 172.17.0.5
2016/04/20 12:22:22 [INFO] serf: EventMemberJoin: k8s2consul 172.17.0.5
2016/04/20 12:25:38 [ERR] memberlist: Push/Pull with sdclient-pd failed: dial tcp 172.17.0.6:8301: i/o timeout
2016/04/20 12:25:46 [INFO] memberlist: Suspect sdclient-pd has failed, no acks received
2016/04/20 12:25:51 [INFO] memberlist: Marking sdclient-pd as failed, suspect timeout reached
2016/04/20 12:25:51 [INFO] serf: EventMemberFailed: sdclient-pd 172.17.0.6
2016/04/20 12:25:58 [INFO] serf: EventMemberJoin: sdclient-pd 172.17.0.6
logs from the client:
2016/04/20 03:19:59 [WARN] memberlist: Refuting a suspect message (from: server-252)
2016/04/20 03:20:22 [INFO] serf: EventMemberFailed: k8s2consul 172.17.0.5
2016/04/20 03:20:28 [INFO] serf: EventMemberJoin: k8s2consul 172.17.0.5
2016/04/20 03:20:46 [INFO] memberlist: Marking k8s2consul as failed, suspect timeout reached
2016/04/20 03:20:46 [INFO] serf: EventMemberFailed: k8s2consul 172.17.0.5
2016/04/20 03:20:48 [INFO] serf: EventMemberJoin: k8s2consul 172.17.0.5
2016/04/20 03:22:26 [WARN] memberlist: Refuting a suspect message (from: server-252)
2016/04/20 03:24:25 [INFO] memberlist: Marking k8s2consul as failed, suspect timeout reached
2016/04/20 03:24:25 [INFO] serf: EventMemberFailed: k8s2consul 172.17.0.5
2016/04/20 03:24:28 [INFO] serf: EventMemberJoin: k8s2consul 172.17.0.5
2016/04/20 03:25:43 [INFO] serf: EventMemberFailed: k8s2consul 172.17.0.5
2016/04/20 03:25:43 [INFO] serf: EventMemberJoin: k8s2consul 172.17.0.5
2016/04/20 03:25:48 [WARN] memberlist: Refuting a suspect message (from: sdclient-pd)
I'm seeing something very similar, with a larger infrastructure (~200 nodes, 40-60 failed at any given time and over 100 suspect for any given time)
Sample of the logs:
2016/04/20 03:54:12 [INFO] memberlist: Marking foo-server-18 as failed, suspect timeout reached
2016/04/20 03:54:12 [INFO] serf: EventMemberFailed: foo-server-18 10.XXX.XXX.XXX
2016/04/20 03:54:13 [INFO] serf: EventMemberJoin: foo-server-18 10.XXX.XXX.XXX
I'm using 0.6.4 on the servers, and clients are currently 0.6.1
Here is some metric data from the consul telemetry, and this is fixed inventory, not dynamically scaling:

b.t.w. in my case, server and client both in 0.6.3
Hi @hehailong5 these issues are almost always caused by network configuration issues. You need port 8301 open for TCP and UDP between all nodes in a cluster (Consul requires them to be a fully connected mesh).
for anyone who reached here after googling the error message ...
I solved this by creating TCP/UDP "allow rules" from and to the same CIDR as that of the machines themselves.
i.e. allow all ports from/to 192.168.x.x network for all the machines in the 192.168.x.x network.
HTH,
Shantanu
@shantanugadgil why all ports are required to open? Should only need to open port those are necessary for consul communication.
@shantanugadgil why all ports are required to open? Should only need to open port those are necessary for consul communication.
this was quite some time back and the "push/pull" error had annoyed me quite a bit during that time, so it was a "WTH moment" decision.
But, basically you are correct, you should open up only required ports.
Most helpful comment
Hi @hehailong5 these issues are almost always caused by network configuration issues. You need port 8301 open for TCP and UDP between all nodes in a cluster (Consul requires them to be a fully connected mesh).