I am using Terraform to register nodes into Consul, then using Consul as a dynamic inventory for ansible.
When destroying and re-creating a node with Terraform, it attempts to update the Consul node with the newly assigned IP, but Consul does not update the node's IP, instead persisting the "left" state and existing metadata.
I would like a way to force the reaping of specific nodes that I know I don't want to wait around for.
Create an aws_instance and consul_node resource with Terraform.
Inspect consul members making note of the IP and the state "alive"
Taint the aws_instance and then re-apply with Terraform.
Inspect 'consul members` again, noticing that the IP has not changed, and the node is still seen as "left"
Please let me know if I can provide more context and thanks in advance!
Hi @cornfeedhobo the https://www.consul.io/docs/commands/force-leave.html command can do this. Do you end up with an old and a new node entry, or does the name change too?
@slackpad Howdy! Thanks for taking a look at this. force-leave does not do what I am looking for (unless I am missing something).
Consul periodically tries to reconnect to "failed" nodes in case it is a network partition. After some configured amount of time (by default 72 hours), Consul will reap "failed" nodes and stop trying to reconnect. The force-leave command can be used to transition the "failed" nodes to "left" nodes more quickly.
Instead, I would like to forcibly "reap" a node at will, and not have it persist in left until automatic reaping occurs.
The force-leave will push it into "left" so Consul will stop trying to contact it. When you say "forcibly" do you mean that you want the node itself to exit if it's still alive, kind of thing?
What I mean by "forcibly", is just to bypass the wait time for reaping. In other words, if I am manually destroying an instance, I should be able to tell consul not to keep the node around in the "left" state until reaping, but rather to immediately purge/reap without waiting until the configured reaping interval.
consul force-reap <node name> would be amazing.
I see. We are also considering https://github.com/hashicorp/consul/issues/2982 which would let you set them to reap in some super short interval so Consul does the work and you don't have to run any special commands.
Yeah, I saw that and side with the viewpoint that a super short reaping interval could be dangerous to servers. My thought was that this would be a safer and less contentious change - I probably should have said as much in the initial post.
Sorry, I confused #2982 with a different issue. Yes my issue is nearly the same, although I still would like to have this as a CLI option.
+1
+1
@slackpad, would it be sufficient, given https://github.com/hashicorp/consul/blob/master/agent/consul/leader.go#L976-L979, to only reap from Serf, or should the nodes be also explicitly reaped from the catalog?
Most helpful comment
What I mean by "forcibly", is just to bypass the wait time for reaping. In other words, if I am manually destroying an instance, I should be able to tell consul not to keep the node around in the "left" state until reaping, but rather to immediately purge/reap without waiting until the configured reaping interval.
consul force-reap <node name>would be amazing.