If a node changes IP address as a server, the gossip layer will properly handle the IP update, but the Raft peer set will not be updated. This will cause replication errors and potentially an outage.
This can be triggered by restarting a docker container with the Consul servers without doing a graceful leave.
The reproduction recipe is here: https://groups.google.com/d/msg/consul-tool/pWj3rHdgdqY/PMXCywgXo28J.
That uses the progrium/consul docker container, which does not have leave_on_terminate=true set by default (currently). If the author accepts and fixes https://github.com/progrium/docker-consul/issues/34 then that will no longer be the case.
We use short-lived windows instances - most last not more than a day. We refer to our instances by a logical name (eg, web_001 through web_33). As instances come and go, we re-use the logical names to fill gaps before adding more at the top end.
This means nodes will come and go with different IPs, but the same node names, and it sounds like we'll be affected by this issue. As a workaround, should we inject some uniqueness into the node-name that we pick for the consul agent (such as the AWS instance ID)?
However, we'd prefer not to have to. We use logical names in the first place for 2 reasons:
@petemounce This will not affect your case. This is only when the server nodes themselves change IPs but not their node name. The clients can change IPs all day :)
What is the proper way to change the IP (or the advertise_addr config option) on a server? Is there one?
It cannot be done currently. You need to remove (gracefully) first then re-add the server. Consul can't handle the address change case.
So, “consul leave” on the host, change the IP or advertise_addr, then restart? That seems to confuse the agents in the cluster, which continue to show the old IP and a state of “left”.
Assuming the node name is the same, they shouldn't be confused. The IP address should update on the clients. If the node name changes, they will be confused since it looks like a different node. But effectively yes, the node is leaving and then re-joining with new configuration.
That doesn't seem to be the case, unfortunately. I modified the config for node consul-000.us-east-1.aws.test.example.com to use a specific advertise_addr; leaving and re-joining with the new config is resulting in lots of
2015/02/24 00:05:10 [WARN] memberlist: Refuting a suspect message (from: consul-000.us-east-1.aws.test.example.com)
on the node I just modified, and messages like this on the other servers and agents in the cluster even minutes after reconfiguring:
2015/02/24 00:06:48 [INFO] serf: EventMemberJoin: consul-000.us-east-1.aws.test.example.com 11.222.33.444
2015/02/24 00:06:48 [INFO] consul: adding server consul-000.us-east-1.aws.test.example.com (Addr: 11.222.33.444:8300) (DC: us-east-1_aws_test)
2015/02/24 00:07:04 [INFO] serf: EventMemberFailed: consul-000.us-east-1.aws.test.example.com 11.222.33.444
2015/02/24 00:07:04 [INFO] consul: removing server consul-000.us-east-1.aws.test.example.com (Addr: 11.222.33.444:8300) (DC: us-east-1_aws_test)
@blalor Can you provide the DEBUG level logs from the machine and maybe one other machine? This looks slightly different than the issue of this ticket. The ticket is that the Raft peers cannot handle an IP update of a server, while this looks like a different issue (Join/Fail) not converging.
https://gist.github.com/blalor/60539004449c35fc079a
consul_debug.000 is for server node consul-000 which had its IP address changed from 10.130.0.248 to 11.222.33.444. consul_debug.001 is for server node consul-001 whose configuration was unchanged save for enabling debug logging.
@blalor It looks like consul-001 is unable to ping (directly or indirectly) consul-000:
[INFO] memberlist: Suspect consul-000.us-east-1.aws.test.example.com has failed, no acks received
This could mean there is some network issue preventing UDP packets between them, which is causing the flapping. Could you investigate possible network issues?
Not anymore; I’ve rebuilt that cluster. :-)
Assuming all servers have leave_on_terminate set, what are the clients suppose to do when the complete cluster is gone? Should they try to reconnect via the DNS name?
Then I'd had a workaround for this at least.
I'm determined to introduce ip change support in Consul.
I've hacked the code to allow that and it seems to work. I'd like to agree with you on the design of the final solution so that, possibly, my pull request could be integrated with mainline Consul.
@armon Please let me know your comments and concerns.
The requirements:
OK, so here's the idea:
Please note that no reverse resolution (IP->node address) is required.
Correctness:
Obviously, there is a question whether such approach is correct.
Assumptions:
Observations:
This is good enough for me, because that covers real-world scenarios I need to handle.
However, I believe than even in case of rapid IP addresses changes the approach stays correct. The new case to consider is when messages reach different destination then intended because serf data is not up-to-date. Still, because it is message content that matters, and not the sender, all invalid requests will be dropped (even now there must be a support for handling stray or delayed messages). There is a risk that some valid requests are dropped, but this affects only efficiency, but not correctness.
Obviously, this is hardly a _proof_ of correctness. I do not intend to perform formal verification though. Is it good enough for you?
I've looked over web API and I think this change doesn't affect it. I hope I haven't broken anything.
fwiw, my workaround worked okayish - until a node hard crashes and you need to replace it.
@jakubzytka's design sounds reasonable to me, but I'm wondering what happens if you end up with two nodes using the same node name.
Two nodes cannot have the same node name; thats a serf requirement.
Right now an error is logged from serf (and a cluster doesn't form I guess) should such thing happen:
2015/12/02 12:08:06 [ERR] memberlist: Conflicting address for blahblah. Mine: 192.168.9.3:8301 Theirs: 192.168.9.1:8301
2015/12/02 12:08:06 [ERR] serf: Node name conflicts with another node at 192.168.9.1:8301. Names must be unique! (Resolution enabled: false)
This would be far less of an issue in my implementation if I had a mechanism to kick out dead raft peers that serf thought were running again.
In my environment changing running servers IPs isn't the issue, it's if a server node fails there's a decent chance someone will not follow procedure and re-launch it with the same name, but a different IP address without force-leaving the failed node first. Serf will think everything is ok, all nodes will show as alive, but there's an orphaned raft peer lying around.
Detecting the orphaned raft node is easy enough to do with a monitoring system by comparing the number of raft peers with the number of consul servers. When that alert triggers normally it would be a simple manner of issuing a force-leave command for the failed node, however currently the force-leave command requires the node to be evicted to exist in serf. If someone doesn't follow procedure and re-launches a failed node and uses the old name (and it gets assigned a different IP address by EC2) then the only option is to bring the entire cluster down to update the peers.json file.
if the force-leave command could be extended (or a new command added) to being able to kick out an orphaned raft node without having to shutdown everything this becomes much less of an issue for me at least.
@deltaroe You can workaround your issue by scripting the startup, and not relying on a manual procedure. Just check and persist the IP when starting consul and then on every restart re-check that IP. If it changed - remove old data and start the node with a new name. Or, alternatively, use node names that contain the IP. You'll never have the same node name for different IPs and you will be able to remove stray peers with force-leave.
The problem (for me) is that both these approaches require quorum of nodes to be alive, and my solution works when there is no quorum.
See related discussion - https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/consul-tool/RqRZL-cnjFg/gcnd9i3IHQAJ
I don't suppose someone from Hashicorp could give @jakubzytka some feedback on the proposed design? This problem has just bitten us _badly_, and although I'm implementing workarounds, it'd be nice if this problem was solved in consul.
@mpalmer sorry you got bit by this.
We are currently working on some improvements for Raft's management of config changes but we want to be super careful we do this in the best way. We are currently leaning towards adding a cluster-wide GUID that comes from the memberlist layer and is used to track identity regardless of IP and node name, so we are working through the implications of that.
@mpalmer If you badly need some solution you can try my patched consul. It handles changing IP address of a node as long as node name stays the same.
The code is available at https://github.com/jakubzytka/consul/tree/ipChangeSupport
The branch is based on consul v0.6 if I remember correctly, but I guess it should apply cleanly over the newest version.
We've been using it in "staging" for a few months without issues.
Hi Guys
I am running a containerized version of consul single node cluster with volumes attached. When I bring this single node cluster up for the first time, the leader is elected successfully. I see the following entry is added in the peers.json
["172.17.4.162:8300"] ==> this is correct as 172.17.4.162 is IP of my container.
Now, I remove this container and make sure that it exits gracefully as I have set "leave_on_terminate": true in my configuration file. After exit, the peers.json returns null. Up to this point, everything seems good. Now, when I restart the consul single node cluster, the IP assigned to the new container is now ["172.17.4.170:8300"] and this is added in peers.json successfully and this is the only value exisitng in peers.json.
In spite of this, consul deployment fails. The new node somehow tries to connect to the previous IP "172.17.4.162" hat has already been deleted from peers.json. Here are the logs:
2016/07/06 20:10:50 [INFO] serf: EventMemberJoin: consul 172.17.4.170
2016/07/06 20:10:50 [INFO] serf: EventMemberJoin: consul.dc1 172.17.4.170
2016/07/06 20:10:50 [INFO] raft: Node at 172.17.4.170:8300 [Follower] entering Follower state
2016/07/06 20:10:50 [INFO] consul: adding LAN server consul (Addr: 172.17.4.170:8300) (DC: dc1)
2016/07/06 20:10:50 [INFO] consul: adding WAN server consul.dc1 (Addr: 172.17.4.170:8300) (DC: dc1)
2016/07/06 20:10:50 [ERR] agent: failed to sync remote state: No cluster leader
2016/07/06 20:10:52 [WARN] raft: Heartbeat timeout reached, starting election
2016/07/06 20:10:52 [INFO] raft: Node at 172.17.4.170:8300 [Candidate] entering Candidate state
2016/07/06 20:10:52 [INFO] raft: Election won. Tally: 1
2016/07/06 20:10:52 [INFO] raft: Node at 172.17.4.170:8300 [Leader] entering Leader state
2016/07/06 20:10:52 [INFO] consul: cluster leadership acquired
2016/07/06 20:10:52 [INFO] consul: New leader elected: consul
2016/07/06 20:10:52 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/07/06 20:10:52 [INFO] raft: Added peer 172.17.4.162:8300, starting replication
2016/07/06 20:10:52 [INFO] raft: Removed peer 172.17.4.162:8300, stopping replication (Index: 18)
2016/07/06 20:10:52 [INFO] consul: member 'consul' joined, marking health alive
2016/07/06 20:10:53 [INFO] agent: Synced service 'consul'
2016/07/06 20:10:55 [ERR] raft: Failed to heartbeat to 172.17.4.162:8300: dial tcp 172.17.4.162:8300: getsockopt: no route to host
2016/07/06 20:10:55 [ERR] raft: Failed to AppendEntries to 172.17.4.162:8300: dial tcp 172.17.4.162:8300: getsockopt: no route to host
2016/07/06 20:10:58 [ERR] raft: Failed to heartbeat to 172.17.4.162:8300: dial tcp 172.17.4.162:8300: getsockopt: no route to host
Could anyone help me find the reason?
Any progress on this?
I use Consul in single mode (one node). When container restart, the ip address changed and the Consul could not start because it remembered his previous ip address.
Is there something to run Consul in single mode (one node).
My config (Docker compose):
version: '2'
services:
consul:
image: consul:0.7.2
ports:
- "8500:8500"
- "8600:8600/tcp"
- "8600:8600/udp"
# https://github.com/hashicorp/consul/issues/166#issuecomment-233711577
command: agent -server -bootstrap -ui -client 0.0.0.0
Highly interested as well. If using docker swarm mode, i got new ip's almost every time.
Closing this in favor of https://github.com/hashicorp/consul/issues/1580.
Most helpful comment
Highly interested as well. If using docker swarm mode, i got new ip's almost every time.