Charts: rabbitmq-ha if pod fails, it doesn't always properly re-join the cluster

Created on 29 Mar 2018 · 9Comments · Source: helm/charts

Is this a request for help?:

I currently have a workaround, although its a manual process

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug: When a Pod tries to reboot and come up, sometimes everything goes smooth and it reclusters properly. Other times I get the following:

$ kubectl logs -n ops dapper-indri-rabbitmq-ha-1 -c rabbitmq-ha -f
2018-03-29 17:01:39.595 [info] <0.33.0> Application lager started on node '[email protected]'
2018-03-29 17:01:41.329 [info] <0.33.0> Application mnesia started on node '[email protected]'
2018-03-29 17:01:41.331 [info] <0.33.0> Application mnesia exited with reason: stopped
2018-03-29 17:01:41.342 [info] <0.33.0> Application mnesia started on node '[email protected]'
2018-03-29 17:01:41.345 [info] <0.33.0> Application mnesia exited with reason: stopped

BOOT FAILED
===========

Error description:
    init:do_boot/3
    init:start_em/1
    rabbit:start_it/1 line 444
    rabbit:'-boot/0-fun-0-'/0 line 300
    rabbit_mnesia:check_cluster_consistency/0 line 663
throw:{error,{inconsistent_cluster,"Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees"}}
Log file(s) (may contain more information):
   <stdout>

2018-03-29 17:01:41.345 [error] <0.5.0>
Error description:
    init:do_boot/3
    init:start_em/1
    rabbit:start_it/1 line 444
    rabbit:'-boot/0-fun-0-'/0 line 300
    rabbit_mnesia:check_cluster_consistency/0 line 663
throw:{error,{inconsistent_cluster,"Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees"}}
Log file(s) (may contain more information):
   <stdout>
{"init terminating in do_boot",{error,{inconsistent_cluster,"Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees"}}}
init terminating in do_boot ({error,{inconsistent_cluster,Node '[email protected]' thinks it's clustered with node '[email protected]', but '[email protected]' disagrees}})

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

Node starting up thinks its already clustered with an existing node, but the existing node disagrees.

Version of Helm and Kubernetes:
N/A but:

$ helm version
Client: &version.Version{SemVer:"v2.6.0", GitCommit:"5bc7c619f85d74702e810a8325e0a24f729aa11a", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:54Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
rabbitmq-ha

What happened:
Sometimes when Pod restarts it won't join the cluster on reboot

What you expected to happen:
When Pod reboots it should be able to join the cluster

How to reproduce it (as minimally and precisely as possible):
Not sure, it doesn't happen consistently at all

Anything else we need to know:
My workaround:
(1) Locate the PV on the Pod that's not starting
(2) Log into the Node which the erroring Pod is running on
(3) find mount location of PV on node
(4) delete contents of mnesia, schema in the mounted PV
(5) delete pod to force a restart

I'll post some details on what commands I ran to accomplish this

Source

bradenwright

Most helpful comment

@bradenwright ugh, this is ugly. That example value made it to the official chart 🤦‍♂️.

michaelklishin on 9 Apr 2018

🎉4

All 9 comments

$ kubectl describe po -n ops dapper-indri-rabbitmq-ha-1 \|grep "Successfully assigned dapper-indri-rabbitmq-ha-1"  Normal   Scheduled              5m                 default-scheduler                                     Successfully assigned dapper-indri-rabbitmq-ha-1 to ip-10-33-113-209.us-west-2.compute.internal
--

$ kubectl get pv -n ops |grep dapper-indri-rabbitmq-ha-1
pvc-051f67b3-1546-11e8-bccd-0627a86dea1e   10Gi       RWO            Delete           Bound     ops/data-dapper-indri-rabbitmq-ha-1              gp2                      38d

$ ssh [email protected]
$ sudo mount | grep pvc-051f67b3-1546-11e8-bccd-0627a86dea1e
/dev/xvdbf on /var/lib/kubelet/pods/2c4d12d2-3371-11e8-b8fa-02a56d23f67e/volumes/kubernetes.io~aws-ebs/pvc-051f67b3-1546-11e8-bccd-0627a86dea1e type ext4 (rw,relatime,data=ordered)
$ sudo rm -rf /var/lib/kubelet/pods/d3c99885-3373-11e8-b8fa-02a56d23f67e/volumes/kubernetes.io~aws-ebs/pvc-051f67b3-1546-11e8-bccd-0627a86dea1e/{mnesia,schema}/*

$ kubectl delete po -n ops dapper-indri-rabbitmq-ha-1

bradenwright on 29 Mar 2018

@bradenwright Any luck? Is the peer discovery in k8s enabled in your rabbitmq setup? https://github.com/rabbitmq/rabbitmq-peer-discovery-k8s

wz185 on 4 Apr 2018

The peer discovery plugin is largely orthogonal to various scenarios that involve node restarts. In this case one node tries to contact a previously known peer which is not aware of the fact that they were previously members of the same cluster. This can be due to the fact that the "target" node has been reset or similar.

Resetting a node e.g. with rabbitmqctl reset or by wiping its database is only necessary when adding a brand new node. A rejoining node will contact its last known peer upon boot. I'd recommend getting a sense of RabbitMQ clustering basics. In most cases you do not want to reset a restarted cluster member.

RabbitMQ node logs are critically important when investigating this kind of issues.

michaelklishin on 6 Apr 2018

This rabbitmq-users response provides a very plausible hypothesis: automatic forced cleanup of unknown nodes was enabled in the Kubernetes example. Unintended removal of nodes temporarily leaving the cluster is one of the consequences of that decision, and apparently a pod restart can trigger it.

michaelklishin on 9 Apr 2018

@michaelklishin thanks for the replies. I'll dig more and see what I can find, I did have the setting you mentioned enabled. I'm using the the rabbitmq-ha chart with default configmap:

https://github.com/kubernetes/charts/blob/master/stable/rabbitmq-ha/templates/configmap.yaml#L61

I'll look thru the options more and see if anything jumps out as being useful to set.

bradenwright on 9 Apr 2018

@bradenwright ugh, this is ugly. That example value made it to the official chart 🤦‍♂️.

michaelklishin on 9 Apr 2018

🎉4

@bradenwright please close this issue. #4823 demonstrates how to work around it (and includes a default change to the config map). The effect is documented as well.

I don't know when the review is going to happen but there are no changes in the chart necessary to avoid nodes that temporarily leave the cluster from being cleaned up.

michaelklishin on 18 Apr 2018