Consul: Server tries to send information to agent although it already left

Created on 18 Jun 2015  路  13Comments  路  Source: hashicorp/consul

I started a Consul server agent locally and it connected to the production cluster. After that I removed the local server from the cluster and it went to "left" state. After restarting the servers in production cluster my local server was not in the members list anymore. However, the leader server still attempted to send information to the local server producing many error messages in the log:

2015/06/18 09:39:43 [ERR] raft: Failed to AppendEntries to 192.168.0.103:8300: dial tcp 192.168.0.103:8300: i/o timeout 2015/06/18 09:39:45 [ERR] raft: Failed to heartbeat to 192.168.0.103:8300: dial tcp 192.168.0.103:8300: i/o timeout

The problem was solved after deleting Consul data on the servers in production:

systemctl stop consul rm -rf /opt/consul rm -rf /opt/staging puppet agent --test

After I started my local Consul agent again, the whole cluster could not function anymore because no leader could be elected (probably because there were 4 servers?). But the weird thing is that the configuration did not point to the cluster in production anymore. This information must have been cached somewhere.

typbug

Most helpful comment

In 0.7 we've done work on peer changes to prevent this issue and created https://www.consul.io/docs/commands/operator.html to allow stale peers to be removed with no downtime.

All 13 comments

Sounds like this could be a bug. If the server successfully left and was marked as such, then this behavior wouldn't be expected. The raft layer uses a special "peers.json" file located in the Consul data dir to remember its known peers. It sounds like this file was likely not updated during the leave event. Were all servers online when you the leave happened? I tried reproducing this locally without success. Are you able to reproduce this issue? Providing any log messages with -log-level set to DEBUG would be very helpful. Thanks!

I currently have a cluster in this state. 5 servers active with 6 servers in the peers.json list. I have tried to issue a force-leave for the server as well as remove the server from the peers.json on every server but it is just repopulated on restart. Looking through the consul/raft code it looks like the peer cleanup is only triggered by 1) an explicit leave rpc call, 2) reconciliation process. Since the reconciliation process uses the serf member list as a starting point, it won't find the missing server and since there is no server any more to issue a leave then option 1 isn't available either. Force leave also doesn't look like it does anything if the node is not known to serf.

Right now trying to pin down the exact mechanism by which simply updating the peers.json file by hand fails to work. The peers list is obviously being overwritten by the the cluster.

So there doesn't appear to be a way to modify the peers list by manually editing the peers.json file without downing the whole cluster. The reason for this is that only the leader can propagate peer list changes and the leader pushes out its peers list upon starting replication:

https://github.com/hashicorp/raft/blob/9dabbbab966c04a0b6efed3cff6960299fed0642/raft.go#L799

So there doesn't seem to be a way to reconcile this state while keeping the cluster online.

@hatmatter - do you know how long it has been in this state? It looks like it should eventually reap the dead peer, but there could be a bug in that path. We saw a similar report from another Consul user today.

Also, do you know if you passed the node name or IP to force-leave? It has to be the node name to work.

My understanding from reading through the reaping code is that it uses the raft members list as the starting place and then reaps any that are no longer alive. The peers list would get updates if a server was reaped during this process, but since the server is on the peers list but not in the raft/serf members list it is never a candidate for reaping.

Edit: meant serf members, not raft members

I thought it was operating off the Serf memnbers list but I'll take a closer look.

I'm also seeing similar behaviour on our Kubernetes infrastructure.

@slackpad: you say force-leave would work, but in our scenario:

  • we have five consul servers running
  • I kill one server on purpose
  • kubernetes brings a new one online, using the same name as the killed one
  • the new server joins the cluster, and works as expected
  • other servers correctly see the new server, using the old name but a new IP address
  • however, they also start to log every ~20s:

2015/09/28 11:17:38 [ERR] raft: Failed to heartbeat to 10.132.1.67:8300: dial tcp 10.132.1.67:8300: no route to host 2015/09/28 11:17:38 [ERR] raft: Failed to AppendEntries to 10.132.1.67:8300: dial tcp 10.132.1.67:8300: no route to host

  • at this point in time, there doesn't seem to be a way to let them forget about this node, because force-leave requires a node name, but that name is already working again, it's just the old IP in peers.json that should be forgotten.

I'm getting same problems, any way it can get traction?

Thanks!

We experience the same miss-behaviour, currently peers.json has an additional host that is not listed in consul members, and to my understanding this cannot be cleaned.

consul version: v0.5.2

Once we have the new cluster membership changes under https://github.com/hashicorp/raft/pull/117 done, we need some CLI utilities to remove a peer by ID, even if it is no longer a member of the cluster and force-leave will no longer work.

In 0.7 we've done work on peer changes to prevent this issue and created https://www.consul.io/docs/commands/operator.html to allow stale peers to be removed with no downtime.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nschoe picture nschoe  路  4Comments

sandstrom picture sandstrom  路  3Comments

eshujiushiwo picture eshujiushiwo  路  3Comments

pritam97 picture pritam97  路  3Comments

matteoturra picture matteoturra  路  4Comments