Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Version of Helm and Kubernetes:
1.10
Which chart:
stable/rabbitmq
What happened:
mnesia storage directory changes every time pod gets new IP address. Same happens for logs. If you will check persistent storage for any cluster that runs long enough (months) you will notice a lot of log files and mnesia storage directories named with ex-IPs of the pod.
Also there are high chances to actually lose data in case of dare situation where all pods of rabbitmq will get terminated at once.
What you expected to happen:
Since we're using statefulset for rabbitmq persistent storage should be in line with that. So for naming we should use POD_NAME rather than POD_IP.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know:
Chart uses IP based clustering and with current configuration of rabbitmq it potentially may lead to rabbitmq inconsistency or even data/schema loss in case all the pods will go down all together.
Also during a lifetime a lot of directories of mnesia storage created with IPs those were allocated to the pod in name. Also it's quite inconvenient to locate RMQ node by IP address
Just got bit by this as well. Default mode really needs to be hostname-based.
If decision on changing default mode is stuck (it's been almost 5 months), maybe we should add a note to readme that in order to properly use persistence, you need to switch to hostname-based clusterting?
I'm rather agree. But we're hesitant to change default to avoid breaks for existent installs.
Though I barely can imagine any existent setup in production that works fine for a long time.
People most probably affected by 2 scenarios:
Summon maintainers of the chart... @carrodher @desaintmartin @juan131 @prydonius @sameersbn @tompizmor
HI @dene14 @azhi !
I also believe that the default clustering method should be hostname. As @dene14 said, right now the chart is not suitable to be run in production.
I will prepare a PR changing the clustering method. As it is a breaking change, I will update the major version of the chart and put a notice in the README. I will try to find if there is any workaround to upgrade to the new version without losing the current data.
Most helpful comment
HI @dene14 @azhi !
I also believe that the default clustering method should be
hostname. As @dene14 said, right now the chart is not suitable to be run in production.I will prepare a PR changing the clustering method. As it is a breaking change, I will update the major version of the chart and put a notice in the README. I will try to find if there is any workaround to upgrade to the new version without losing the current data.