helm install --name rabbitmq --set rabbitmqUsername=admin,persistence.storageClass=rook-block stable/rabbitmq
Also I don't think this scales up and down, I was hoping this would use an operator or at least a statefulset.
Any help would be appreciated, I would like to easy scale up and down between 3 node and 7 node.
RabbitMQ HA was just merged to stable chart:
https://github.com/kubernetes/charts/tree/master/stable/rabbitmq-ha
Compared to the regular RabbitMQ chart, it is based on the official RabbitMQ Docker image and make use of Statefulset which allow you to increase/increase the number of replicas on demande.
cc: @prydonius @tompizmor @sameersbn
@etiennetremel this is by far the easiest helm chart I've ever used. It's amazing and does exactly what I want. Thanks for pointing it out. Wish all were this easy, hint, wish more used stateful sets and/or operator pattern.
@etiennetremel why is NodePort the default for service.type, I would have expected ClusterIP?
@etiennetremel I think I found a problem with the autoscaling.
I started with a 3 node cluster, then did an helm upgrade to change to a 5 node cluster.
I get this error and the two new rabbits never join the cluster.
2017-12-19 15:26:44.957 [error] <0.8251.0> * Connection attempt from disallowed node '[email protected]' *
2017-12-19 15:27:08.206 [error] <0.8324.0> * Connection attempt from disallowed node '[email protected]' *
My guess is the erlang cookie is different on those two nodes than the first three. Could likely be fixed by allowing cookie to be passed as parameter or having it as a secret they all share.
A few other notes, when I scale back down to 3 it leaves the persistent volumes around of the 5, maybe this is okay sometimes but I would love the ability to have this auto clean that up.
Also when i do a helm delete --purge it leaves the persistent volumes around, would love if this had the option to clean that up as well.
Also when the two new nodes failed to join the cluster in K8s they said healthy, some sort of probe should detect that they did not join the cluster and report them as unhealthy.
Thanks @AceHack for testing it.
For the nodes to be able to connect back to other nodes you need to always use the same value for the _rabbitmqErlangCookie_. By default Helm generate a random value which, if not defined, will regenerate the secret everytime you do an upgrade.
You can prevent this to happen using the following command while upgrading
$ export ERLANGCOOKIE=$(kubectl get secrets -n <NAMESPACE> <HELM_RELEASE_NAME>-rabbitmq-ha -o jsonpath={.data.rabbitmq-erlang-cookie}
$ helm upgrade --name <HELM_RELEASE_NAME> \
--set rabbitmqErlangCookie=$ERLANGCOOKIE \
stable/rabbitmq-ha
The helm delete --purge command only delete resources defined in the templates directory and is, I guess, expected behavior.
From the Kubernetes Statefulset documentation:
Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources.
I would prefer keeping it this way as this is Helm/Kubernetes behavior, not the chart.
For the health check of the nodes, I based it on the other stable/rabbitmq chart, which use rabbitmqctl status.
I noticed that there is also this command rabbitmqctl node_health_check that could be more appropriate, but because the official RabbitMQ Statefulset example use it, I would think this is the one to use, but maybe you have more input?
Good catch for the service type, I'm issuing a PR with default ClusterIP instead, oops.
@etiennetremel I did not see you already had rabbitmqErlangCookie as a first class parameter in your helm chart, that's exactly what I was looking for.
Also on the health check, I created a bug on rabbit's offical K8s example so they can decide best how to fix it.
Thank you so much for this chart, it's one of the best if not the best one I've used so far, your work here is great!!
Also I agree on the keeping around the volumns, just watched a video on statefulsets and this is functions as designed. Looks like if I want this changed I need to push on the Statefulset design in K8s itself.
Thanks again so much!!
I created a new bug in rabbitmq since they closed the first one because I did not add enough info :(
@etiennetremel Why did you choose 3.7-alpine over pivotalrabbitmq/rabbitmq-autocluster:3.7-k8s? What is the difference here?
Thanks.
@etiennetremel Also I just ran into another issue, if you create a cluster with a particular erlang cookie, let's call it cookie 1. Then delete the cluster, then create a new cluster with a different erlang cookie, let's call it 2 then the new cluster will fail. The only way I've figured to recover is by deleting all the related PVs and PVCs before creating the new cluster. It seems there is no easy way to roll my erlang cookie if it gets compromised.
I guess both images are similar, I first used the pivotal one because the official Docker image for RabbitMQ 3.7 wasn't released yet. But maybe they have some extra feature that I don't know. I'm also in favor of having a Docker image where you can see the Dockerfile definition and this is not the case with the Pivotal image.
I need to read more about the Erlang Cookie, maybe there is an extra step that need to be taken when upgrading the cluster. If you have ideas on how to do it, let me know.
I think the K8s one uses specific K8s apis instead of DNS for auto discovery. And yes, it's very annoying not being able to see the dockerfile. I don't know at this point much about what would need to happen for the erlang cookie, my guess is copying it somewhere on the PV everytime before rabbit starts or something like that, I don't really know.
As explained in https://github.com/rabbitmq/rabbitmq-peer-discovery-k8s/issues/12, the issue here is that of Erlang cookie management, not health check reporting or a lack of any specific CLI commands. Nodes that fail to join the cluster eventually (or immediately, depending on the context) stop and that would fail any imaginable health check. However, if you have two sets of nodes using the same peer discovery mechanism and differnet cookies, they will form two clusters and as far as any suggested health checks are concerned, all individual nodes are fine.
A node cannot know what peers it is supposed to have at any given moment. So we will cover this in the docs and have a working idea of a CLI command that would report a failure if there are fewer than N cluster nodes online at the moment. The rest is just an interplay between certain aspects of node startup and unfortunate Erlang cookie management, not a bug in RabbitMQ core or the Kubernetes peer discovery plugin.
The Docker image used in the examples is really basic, created only for examples and does not use any undocumented or proprietary features. @gsantomaggio can we make it public, for the sake of transparency?
@gerhard first obvious issue in improving Kubernetes support is here: cookie management is not a solved problem.
@etiennetremel here is some more info on erlang cookies. http://rabbitmq.1065348.n5.nabble.com/Proper-process-to-change-erlang-cookie-td29631.html Still looking into why deleting and re-creating a cluster with a new erlang cookie causes problems
@michaelklishin I will have to disagree with your assessment of erlang cookie management being the main issue here. Even with no erlang cookie problems, you would run into this split brain problem if there was a network partition segmenting 2 nodes from the 3 nodes. Cluster membership health is the fundamental problem here. No matter how the external environment is set up, bad Erlang cookies or not, network partitions or not, it's important to be able to report correct status of cluster membership.
@micahhausler here is a link to the discussion on the rabbitmq-users mailing list. Also I would like to respond to your last comment on this bug where you locked the thread right afterwards where no response could be made.
Network partitions are involved in rabbitmq-autocluster, at all times even during initial peer discovery while forming a cluster, network partitions can happen. In CAP theorem you can only choose C or A, you are forced to always live with P, the fact that the network is unreliable even during initial peer discovery and cluster formation. A badly timed network partition can cause rabbitmq-autocluster to form two independent, split brain clusters.
Before the autocluster plugin people were forced to manually create rabbit clusters. If a network partition occurred during this manual process, then the human just kept trying to join the nodes to the cluster until they all succeeded. The autocluster plugin does not take into account network partitions that occur during the initial discovery and formation phase of the cluster, this is the inherent problem that causes two split brain clusters to form.
I drew a picture of the network partitining issue.

I also tried to create a bug on rabbitmq's github directly too, seems this is functions as designed from their point of view.
It's hard to believe this auto clustering will be okay in production where tons of deployments are happening a day and no way to detect before hand if a temporary network partition is occuring. Hopefully it's not or you might get two split brain clusters.
This is not a new discovery, all this we have seen before with BOSH and Cloud Foundry. During peer discovery nodes will get a set of peers to join and will try them in order. The first one to succeed is the last one that will be tried.
The reason why this is NOT a major problem is this: once a cluster is formed, peer discovery isn't used and nodes simply rejoin with retries (10 retries in 30 second intervals by default). Existing members will not use peer discovery, only newly brought up will, so the problem won't apply to the majority of nodes.
As rabbitmq-autocluster README states, a peer discovery mechanism is not a replacement for understanding of how RabbitMQ clusters work, both during initial formation and after that.
Fixating on one or two scenarios during initial formation is counter-productive. It completely ignores the fact that nodes in a formed cluster work somewhat differently when they restart, e.g. they do not use the peer discovery backend.
All of these problems are not new and team RabbitMQ had to overcome them all in RabbitMQ BOSH release (in fact, 2 different release) for Cloud Foundry and our own infra automation needs.
First course of action should be finding a way to distribute an Erlang cookie to a set of nodes, both for initial formation and when more nodes are added (scaling out). Without it everything else would be a waste of time.
@michaelklishin Instead of relying on this helm template to randomly generate an erlang cookie, I just created one myself and typed it in the values.yaml file. This solved the problem of mismatched cookies.
Also the fact that after reboot the node no longer use discovery means once the split brain forms it's there to stay forever, it's locked in its set of peers at that point.
@etiennetremel maybe it would be possible to use an approach like kube-lego does for certs. Have a service that runs in the cluster who's whole point in life is to look for secrets that need populating. It could be a random secret generator that looks for annotations to determine secret generation is necessary.
@etiennetremel also look at https://github.com/rabbitmq/rabbitmq-cli/issues/235
This is a great solution by @michaelklishin that should solve the cluster formation problems I brought up earlier.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Most helpful comment
RabbitMQ HA was just merged to stable chart:
https://github.com/kubernetes/charts/tree/master/stable/rabbitmq-ha
Compared to the regular RabbitMQ chart, it is based on the official RabbitMQ Docker image and make use of Statefulset which allow you to increase/increase the number of replicas on demande.