Charts: [stable/rabbitmq] Recover from "Waiting for Mnesia tables" after all nodes forced shutdown

Created on 3 May 2019 · 33Comments · Source: helm/charts

Is your feature request related to a problem? Please describe.
Yes, in our dev/staging environment in AWS we turn-off at night time (set scaling group to 0) the nodes of EKS cluster to save costs and we enabled persistence for rabbitmq, so it was like unexpected shutdown for all rabbitmq nodes and it can't recover with "Waiting for Mnesia tables", i tried setting podManagementPolicy and service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" but it had no effect, still keeps looping with "Waiting for Mnesia tables"

Describe the solution you'd like
Can this solution be applied https://github.com/helm/charts/pull/9645#issuecomment-478638566 as an option? We'd really prefer availability over integrity.

Describe alternatives you've considered
Don't use persistence.

Source

denis111

Most helpful comment

Well, first, I can't execute rabbitmqctl force_boot because it says "Error: this command requires the 'rabbit' app to be stopped on the target node. Stop it with 'rabbitmqctl stop_app'.". But if we stop it the pod will just restart without giving a possibility to execute "abbitmqctl force_boot"...
So i created force_load file in "/opt/bitnami/rabbitmq/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-pre-0.rabbitmq-pre-headless.pre.svc.cluster.local" in my case, and it worked!

denis111 on 17 May 2019

👍13 ❤2 🚀1

All 33 comments

Hi @denis111 ,
Are you able to know the order in which the nodes were shut down? I am not a RabbitMQ expert but in the documentation you can see this:

Normally when you shut down a RabbitMQ cluster altogether, the first node you restart should be the last one to go down, since it may have seen things happen that other nodes did not. But sometimes that's not possible: for instance if the entire cluster loses power then all nodes may think they were not the last to shut down.

Link to documentation: https://www.rabbitmq.com/rabbitmqctl.8.html#force_boot

Could you try if the force_boot works in case you not know the order?

miguelaeh on 15 May 2019

@miguelaeh Thank you for answering. No, we can't know the order, this is just autoscaling group with schedule to scale to 0 instances at night time when nobody's working. So it's unacceptable for us if anything is not able to recover from such "disaster" as sort of "unexpected" shut down of all nodes so in that case we prefer not using persistence because we prefer availability over integrity.

I hope I'll try to play with force_boot this friday and I will let you know if it worked.

denis111 on 15 May 2019

Thank you @denis111 ,
let me know what happen when you try with this option.

miguelaeh on 15 May 2019

denis111 on 17 May 2019

👍13 ❤2 🚀1

I'm glad it worked.
That is a cool solution.

miguelaeh on 17 May 2019

Yes but how to autmate it? I mean the creation of force_load file in some init container maybe...

denis111 on 17 May 2019

You could try mounting the file in the init container via a Configmap, but if you don't need to execute any command before the main container start, I guess you could just mount the file in the main container (also with a Configmap).

miguelaeh on 20 May 2019

I can't find the "helm way" to do it, existing configmap template in rabbitmq chart doesn't allow to add extra files as well as statefuset template doesn't allow to add some extra init container or extra volume mount...

denis111 on 20 May 2019

The Chart does not support that at this moment. You have to add it manually (you can clone the repository and modify the Chart with your needs),

miguelaeh on 20 May 2019

Well, we'd like to use mainstream chart, I'll see if I can make pull request then.

denis111 on 21 May 2019

I've detected if we enable forceBoot option on a new install without existing PVC (with clean new volume) then RabbitMQ is unable to start with Error: enoent. I'm creating a PR to address this issue.

denis111 on 4 Jun 2019

Hey @denis111, so I'm having this issue with the latest version of stable/rabbitmq-ha, i.e.

[warning] <0.311.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}

Do I need to make any modifications to the values.yaml to make use of your adjustments?

mhyousefi on 28 Jan 2020

Actually, removing the pvcs before redeploying my rabbit resolved the problem for me.

mhyousefi on 28 Jan 2020

I think the the easiest solution is to have an init container which deletes the mnesia folder during startup.

akrepon on 5 Feb 2020

if the rabbit node is waiting for the other nodes to come up, and it's not coming up because the statefulset is booting them sequentially, how about podManagementPolicy: Parallel

Parallel: "will create pods in parallel to match the desired scale without waiting, and on scale down will delete all pods at once" - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#statefulsetspec-v1-apps

andylippitt on 7 Feb 2020

some basic testing with: podManagementPolicy: Parallel

As expected all nodes start simultaneously and it seems to recover correctly. Further, with missing volumes, the node-discovery/cluster-init seems work as expected, however the cluster name was randomized if that matters to you.

andylippitt on 7 Feb 2020

@andylippitt smart idea to use podManagementPolicy: Parallel
However was thinking what would happen if you upgrade your RMQ cluster eg. new rmq image...in that case will it not take down all the pods at once and recreate all of them in parallel with new image?

vkasgit on 12 Feb 2020

ignore my comment above, the updateStrategy: RollingUpdate will take care of that situation.

Do you think in any edge case scenario having podManagementPolicy: Parallel will create an outage since it takes down all 3 pods?

vkasgit on 13 Feb 2020

I have found a problem with podManagementPolicy: Parallel. There's a race condition on initialization if you don't specify rabbitmqErlangCookie. I now have a condition where a single pod is running with a RABBITMQ_ERLANG_COOKIE which is different from the current value of the secret. I suspect this was a result of concurrent initialization and will try to reinstall with an explicit value.

andylippitt on 19 Feb 2020

Yes I did test with podManagementPolicy: Parallel but faced issues. sometimes pods did not comeback healthy. Instead setting force_boot flag to true which was suggested in the previous thread worked for me. I tested multiple times bringing the pods all together at once, bringing pods one at a time within like 2-5 mins creating some sort of a mess and with that flag on all the pods came back healthy.

Additional setting in custom values.yaml:
We are also setting the lifecycle for a pod so the pods have graceful termination when it is taken down.
lifecycle:
preStop:
exec:
command: ["rabbitmqctl","shutdown"]

vkasgit on 19 Feb 2020

@vkasgit were you specifying an explicit value for rabbitmqErlangCookie in your failed testing?

Edit: I think the issue is not a concurrency issue, rather in our case we just ran into this: https://github.com/helm/charts/issues/5167 tl;dr: specify rabbitmqErlangCookie in your prod installs

andylippitt on 19 Feb 2020

in our installation that was failing i had the secret pre-created with erlang cookie

vkasgit on 19 Feb 2020

Actually, removing the pvcs before redeploying my rabbit resolved the problem for me.

is it safe to remove the pvc?

zffocussss on 31 Mar 2020

Actually, removing the pvcs before redeploying my rabbit resolved the problem for me.

is it safe to remove the pvc?

no it's not safe. this mounts keeps important files (db files etc)
volumeMounts:
- mountPath: /var/lib/rabbitmq
name: data
- mountPath: /etc/rabbitmq
name: config
- mountPath: /etc/definitions
name: definitions
readOnly: true
dnsPolicy: ClusterFirst

my problem solved with below method. (change clustername). run exec command when pod state is Running
kubectl exec -ti clustername-rbmq-rabbitmq-ha-0 /bin/sh
cd var/lib/rabbitmq/mnesia/rabbit@perfx-rbmq-rabbitmq-ha-0.clustername-rbmq-rabbitmq-ha-discovery.hrnext-prod.svc.cluster.local
touch force_load

watch for pod statuses

hakanozanlagan on 27 Apr 2020

👍1

@hakanozanlagan first of all thanks for posting your solution. I tried the same step and unfortunately, I am using the user rabbitmq which don't have the permission in the folder and I don't have any sudo access. Is there any alternative solution?

rakeshnambiar on 13 May 2020

@rakeshnambiar In your helm chart values.yml Did you explicitly try setting the force_boot flag to true? Try that option. Also check your user permissions as well. Those can also be set throught he values.yml

vkasgit on 14 May 2020

Hi @vkasgit thanks the force_boot solved the issue and I can also see the runAsUser etc on the values yaml. Btw - by default, it's created 3 PODs and I can see 3 PVCs as well. Is this expected?

rakeshnambiar on 14 May 2020

@vkasgit We are occasionally running into this mnesia table issue ourselves (which we have been fixing by deleting the pvc). I was curious if by setting updateStrategy to RollingUpdate (instead of the default onDelete) eliminates the need for force_boot? Our podManagementPolicy is the default of OrderedReady. In fact, other than basic password and policies, our values are otherwise default.

ytjohn on 19 May 2020

@ytjohn Do you have the forceBoot: true flag on and still occasionally run into mnesia table?

The following are some settings I have and haven't run into mnesia issue so far(knock on the wood). Try adding lifecycle and see if that helps. What that does is when RMQ node is forcefully taken down the prestop command kicks in and makes a graceful termination.

podManagementPolicy: OrderedReady
updateStrategy: RollingUpdate
forceBoot: true

lifecycle:
preStop:
exec:
command: [rabbitmqctl, shutdown]

EDIT: Please check the indents of lifecyle. unable to indent properly

vkasgit on 19 May 2020

I haven't tried forceBoot: true yet , but it seemed a rolling update would pretty much take care of the need for for forceBoot. That said, I don't think it will hurt either, so we will go ahead and set them both and if that keeps the mnesia table issue from popping up, we'll call it good. Thank you.

ytjohn on 19 May 2020

@akrepon deleting mnesia database would cause crashed pod as standalone or blank node

Davidrjx on 6 Jun 2020

Actually, removing the pvcs before redeploying my rabbit resolved the problem for me.

is it safe to remove the pvc?

no it's not safe. this mounts keeps important files (db files etc)
volumeMounts:

mountPath: /var/lib/rabbitmq
name: data

mountPath: /etc/rabbitmq
name: config

mountPath: /etc/definitions
name: definitions
readOnly: true
dnsPolicy: ClusterFirst

my problem solved with below method. (change clustername). run exec command when pod state is Running
kubectl exec -ti clustername-rbmq-rabbitmq-ha-0 /bin/sh
cd var/lib/rabbitmq/mnesia/rabbit@perfx-rbmq-rabbitmq-ha-0.clustername-rbmq-rabbitmq-ha-discovery.hrnext-prod.svc.cluster.local
touch force_load

watch for pod statuses

i do not fully understand your solution about cluster_name change, just only touching force_load file in mnesia data dir?

Davidrjx on 7 Jun 2020

Well, first, I can't execute rabbitmqctl force_boot because it says "Error: this command requires the 'rabbit' app to be stopped on the target node. Stop it with 'rabbitmqctl stop_app'.". But if we stop it the pod will just restart without giving a possibility to execute "abbitmqctl force_boot"...
So i created force_load file in "/opt/bitnami/rabbitmq/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-pre-0.rabbitmq-pre-headless.pre.svc.cluster.local" in my case, and it worked!

for rabbitmq install from helm chart it is /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.acbo-queues.svc.cluster.local