Hi ,
I am using aks with a rabbitmq-ha , the cluster using the mirror queue .
And after some time - I am getting this warning and the pod state is crash loop -
The log is
warning: /var/lib/rabbitmq/.erlang.cookie contents do not match RABBITMQ_ERLANG_COOKIE
=ERROR REPORT==== 19-Mar-2019::11:22:34.109832 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:34.116779 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:34.321200 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:34.507053 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:35.134047 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:35.136062 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:35.340871 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:35.523251 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:36.147416 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
=ERROR REPORT==== 19-Mar-2019::11:22:36.147658 ===
* Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:36.560 [error] <0.171.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:37.162 [error] <0.177.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:37.163 [error] <0.175.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:37.365 [error] <0.179.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:37.554 [error] <0.181.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:38.175 [error] <0.183.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:38.175 [error] <0.185.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:38.395 [error] <0.187.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:38.567 [error] <0.199.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:38.969 [info] <0.316.0>
Starting RabbitMQ 3.7.12 on Erlang 21.2.6
Copyright (C) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
## ##
## ## RabbitMQ 3.7.12. Copyright (C) 2007-2019 Pivotal Software, Inc.
########## Licensed under the MPL. See http://www.rabbitmq.com/
###### ##
########## Logs:
Starting broker...
2019-03-19 11:22:38.970 [info] <0.316.0>
node : rabbit@rabbitmq-rabbitmq-ha-1.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : yOx+JMl/alQ4oFQ2APkrnA==
log(s) :
database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-rabbitmq-ha-1.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
2019-03-19 11:22:39.187 [error] <0.318.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:39.187 [error] <0.320.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:39.398 [info] <0.326.0> Memory high watermark set to 5859 MiB (6144000000 bytes) of 32168 MiB (33731399680 bytes) total
2019-03-19 11:22:39.402 [info] <0.328.0> Enabling free disk space monitoring
2019-03-19 11:22:39.402 [info] <0.328.0> Disk free limit set to 50MB
2019-03-19 11:22:39.408 [info] <0.331.0> Limiting to approx 1048476 file handles (943626 sockets)
2019-03-19 11:22:39.408 [info] <0.334.0> FHC read buffering: OFF
2019-03-19 11:22:39.408 [info] <0.334.0> FHC write buffering: ON
2019-03-19 11:22:39.410 [error] <0.332.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:39.444 [info] <0.316.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-03-19 11:22:39.580 [error] <0.365.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:40.196 [error] <0.367.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:40.198 [error] <0.369.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:40.424 [error] <0.371.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:40.594 [error] <0.373.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:41.208 [error] <0.375.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-3.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:41.210 [error] <0.377.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-4.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
2019-03-19 11:22:41.436 [error] <0.379.0> * Connection attempt from disallowed node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' *
There is any option ti fix it auto and no delete the pod ?
Best Regards
Please provide your custom values.yaml. Most likely you aren't setting a static rabbitmqErlangCookie value so that all rabbitmq nodes are using different cookies.
## RabbitMQ application credentials
## Ref: http://rabbitmq.com/access-control.html
##
rabbitmqUsername: rabbitmq
rabbitmqPassword: tstdsatdts
## RabbitMQ Management user used for health checks
managementUsername: management
managementPassword: E9R3fjZm4ejFkVFE
## Place any additional key/value configuration to add to rabbitmq.conf
## Ref: https://www.rabbitmq.com/configure.html#config-items
extraConfig: |
# queue_master_locator = min-masters
## Place advanced.config file in /etc/rabbitmq/advanced.config
## Ref: https://www.rabbitmq.com/configure.html#advanced-config-file
advancedConfig: |
## Definitions specification within the secret, will always be mounted
## at /etc/definitions/defintions.json
definitionsSource: definitions.json
## Place any additional plugins to enable in /etc/rabbitmq/enabled_plugins
## Ref: https://www.rabbitmq.com/plugins.html
extraPlugins: |
rabbitmq_shovel,
rabbitmq_shovel_management,
rabbitmq_federation,
rabbitmq_federation_management,
definitions:
users: |-
# {
# "name": "myUsername",
# "password": "myPassword",
# "tags": "administrator"
# }
vhosts: |-
# {
# "name": "/rabbit"
# }
parameters: |-
# {
# "value": {
# "src-uri": "amqp://localhost",
# "src-queue": "source",
# "dest-uri": "amqp://localhost",
# "dest-queue": "destination",
# "add-forward-headers": false,
# "ack-mode": "on-confirm",
# "delete-after": "never"
# },
# "vhost": "/",
# "component": "shovel",
# "name": "test"
# }
permissions: |-
# {
# "user": "myUsername",
# "vhost": "/rabbit",
# "configure": ".*",
# "write": ".*",
# "read": ".*"
# }
queues: |-
# {
# "name":"myName",
# "vhost":"/rabbit",
# "durable":true,
# "auto_delete":false,
# "arguments":{}
# }
exchanges: |-
# {
# "name":"myName",
# "vhost":"/rabbit",
# "type":"direct",
# "durable":true,
# "auto_delete":false,
# "internal":false,
# "arguments":{}
# }
bindings: |-
# {
# "source":"myName",
# "vhost":"/rabbit",
# "destination":"myName",
# "destination_type":"queue",
# "routing_key":"myKey",
# "arguments":{}
# }
## Sets the policies in definitions.json. This can be used to control the high
## availability of queues by mirroring them to multiple nodes.
## Ref: https://www.rabbitmq.com/ha.html
policies: |-
{
"name": "ha",
"pattern": ".*",
"vhost": "/",
"definition": {
"ha-mode": "exactly",
"ha-params": 3,
"ha-sync-mode": "automatic",
"ha-sync-batch-size": 1
}
}
## RabbitMQ default VirtualHost
## Ref: https://www.rabbitmq.com/vhosts.html
##
rabbitmqVhost: "/"
## Erlang cookie to determine whether different nodes are allowed to communicate with each other
## Ref: https://www.rabbitmq.com/clustering.html
##
# rabbitmqErlangCookie:
## RabbitMQ Memory high watermark
## Ref: http://www.rabbitmq.com/memory.html
##
rabbitmqMemoryHighWatermark: 6144MB
rabbitmqMemoryHighWatermarkType: absolute
## EPMD port for peer discovery service used by RabbitMQ nodes and CLI tools
## Ref: https://www.rabbitmq.com/clustering.html
##
rabbitmqEpmdPort: 4369
## Node port
rabbitmqNodePort: 5672
## Manager port
rabbitmqManagerPort: 15672
## Set to true to precompile parts of RabbitMQ with HiPE, a just-in-time
## compiler for Erlang. This will increase server throughput at the cost of
## increased startup time. You might see 20-50% better performance at the cost
## of a few minutes delay at startup.
rabbitmqHipeCompile: false
## SSL certificates
## Red: http://www.rabbitmq.com/ssl.html
rabbitmqCert:
enabled: false
# Specifies an existing secret to be used for SSL Certs
existingSecret: ""
## Create a new secret using these values
cacertfile: |
certfile: |
keyfile: |
## Extra volumes for statefulset
extraVolumes: []
## Extra volume mounts for statefulset
extraVolumeMounts: []
## Authentication mechanism
## Ref: http://www.rabbitmq.com/authentication.html
rabbitmqAuth:
enabled: false
config: |
# auth_mechanisms.1 = PLAIN
# auth_mechanisms.2 = AMQPLAIN
# auth_mechanisms.3 = EXTERNAL
## Automatic Partition Handling Strategy (split brain handling)
## Ref: https://www.rabbitmq.com/partitions.html#automatic-handling
## Note: pause-if-all-down is not supported without using a custom configmap since it requires extra
## configuration.
rabbitmqClusterPartitionHandling: autoheal
## Authentication backend
## Ref: https://github.com/rabbitmq/rabbitmq-auth-backend-http
rabbitmqAuthHTTP:
enabled: false
config: |
# auth_backends.1 = http
# auth_http.user_path = http://some-server/auth/user
# auth_http.vhost_path = http://some-server/auth/vhost
# auth_http.resource_path = http://some-server/auth/resource
# auth_http.topic_path = http://some-server/auth/topic
## LDAP Plugin
## Ref: http://www.rabbitmq.com/ldap.html
rabbitmqLDAPPlugin:
enabled: false
## LDAP configuration:
config: |
# auth_backends.1 = ldap
# auth_ldap.servers.1 = my-ldap-server
# auth_ldap.user_dn_pattern = cn=${username},ou=People,dc=example,dc=com
# auth_ldap.use_ssl = false
# auth_ldap.port = 389
# auth_ldap.log = false
## MQTT Plugin
## Ref: http://www.rabbitmq.com/mqtt.html
rabbitmqMQTTPlugin:
enabled: false
## MQTT configuration:
config: |
# mqtt.default_user = guest
# mqtt.default_pass = guest
# mqtt.allow_anonymous = true
## Web MQTT Plugin
## Ref: http://www.rabbitmq.com/web-mqtt.html
rabbitmqWebMQTTPlugin:
enabled: false
## Web MQTT configuration:
config: |
# web_mqtt.ssl.port = 12345
# web_mqtt.ssl.backlog = 1024
# web_mqtt.ssl.certfile = /etc/cert/cacert.pem
# web_mqtt.ssl.keyfile = /etc/cert/cert.pem
# web_mqtt.ssl.cacertfile = /etc/cert/key.pem
# web_mqtt.ssl.password = changeme
## STOMP Plugin
## Ref: http://www.rabbitmq.com/stomp.html
rabbitmqSTOMPPlugin:
enabled: false
## STOMP configuration:
config: |
# stomp.default_user = guest
# stomp.default_pass = guest
## Web STOMP Plugin
## Ref: http://www.rabbitmq.com/web-stomp.html
rabbitmqWebSTOMPPlugin:
enabled: false
## Web STOMP configuration:
config: |
# web_stomp.ws_frame = binary
# web_stomp.cowboy_opts.max_keepalive = 10
## AMQPS support
## Ref: http://www.rabbitmq.com/ssl.html
rabbitmqAmqpsSupport:
enabled: false
# NodePort
amqpsNodePort: 5671
# SSL configuration
config: |
# listeners.ssl.default = 5671
# ssl_options.cacertfile = /etc/cert/cacert.pem
# ssl_options.certfile = /etc/cert/cert.pem
# ssl_options.keyfile = /etc/cert/key.pem
# ssl_options.verify = verify_peer
# ssl_options.fail_if_no_peer_cert = false
## Number of replicas
replicaCount: 5
image:
repository: rabbitmq
tag: 3.7-alpine
pullPolicy: IfNotPresent
## Optionally specify an array of imagePullSecrets.
## Secrets must be manually created in the namespace.
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
# pullSecrets:
# - myRegistrKeySecretName
busyboxImage:
repository: busybox
tag: latest
pullPolicy: Always
## Duration in seconds the pod needs to terminate gracefully
terminationGracePeriodSeconds: 30
service:
annotations: {}
clusterIP: None
## List of IP addresses at which the service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []
loadBalancerIP: ""
loadBalancerSourceRanges: []
type: ClusterIP
## Customize nodePort number when the service type is NodePort
### Ref: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
###
epmdNodePort: null
amqpNodePort: null
managerNodePort: null
podManagementPolicy: OrderedReady
## Statefulsets rolling update update strategy
## Ref: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update
##
updateStrategy: OnDelete
## Statefulsets Pod Priority
## Ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
## priorityClassName: ""
## We usually recommend not to specify default resources and to leave this as
## a conscious choice for the user. This also increases chances charts run on
## environments with little resources, such as Minikube. If you do want to
## specify resources, uncomment the following lines, adjust them as necessary,
## and remove the curly braces after 'resources:'.
## If you decide to set the memory limit, make sure to also change the
## rabbitmqMemoryHighWatermark following the formula:
## rabbitmqMemoryHighWatermark = 0.4 * resources.limits.memory
##
resources: {}
# limits:
# cpu: 100mm
# memory: 1Gi
# requests:
# cpu: 100mm
# memory: 1Gi
initContainer:
resources: {}
# limits:
# cpu: 100mm
# memory: 128Mi
# requests:
# cpu: 100mm
# memory: 128Mi
## Data Persistency
persistentVolume:
enabled: true
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
# storageClass: "-"
name: data
accessModes:
- ReadWriteOnce
size: 30Gi
annotations: {}
## Node labels for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
##
nodeSelector: {}
## Node tolerations for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature
##
tolerations: []
## Extra Annotations to be added to pod
podAnnotations: {}
## Pod affinity
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
podAntiAffinity: soft
## Create default configMap
##
existingConfigMap: false
## Add additional labels to all resources
##
extraLabels: {}
## Role Based Access
## Ref: https://kubernetes.io/docs/admin/authorization/rbac/
##
rbac:
create: false
## Service Account
## Ref: https://kubernetes.io/docs/admin/service-accounts-admin/
##
serviceAccount:
create: true
## The name of the ServiceAccount to use.
## If not set and create is true, a name is generated using the fullname template
# name:
ingress:
## Set to true to enable ingress record generation
enabled: false
path: /
## The list of hostnames to be covered with this ingress record.
## Most likely this will be just one host, but in the event more hosts are needed, this is an array
## hostName: foo.bar.com
## Set this to true in order to enable TLS on the ingress record
tls: false
## If TLS is set to true, you must declare what secret will store the key/certificate for TLS
tlsSecret: myTlsSecret
## Ingress annotations done as key:value pairs
annotations:
# kubernetes.io/ingress.class: nginx
livenessProbe:
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
failureThreshold: 6
initialDelaySeconds: 20
timeoutSeconds: 3
periodSeconds: 5
# Specifies an existing secret to be used for RMQ password and Erlang Cookie
existingSecret: ""
prometheus:
## Configures Prometheus Exporter to expose and scrape stats.
exporter:
enabled: false
env: {}
image:
repository: kbudde/rabbitmq-exporter
tag: v0.29.0
pullPolicy: IfNotPresent
## Port Prometheus scrapes for metrics
port: 9090
## Comma-separated list of extended scraping capabilities supported by the target RabbitMQ server
capabilities: "bert,no_sort"
## Allow overriding of container resources
resources: {}
# limits:
# cpu: 200m
# memory: 1Gi
# requests:
# cpu: 100m
# memory: 100Mi
## Prometheus is using Operator. Setting to true will create Operator specific resources like ServiceMonitors and Alerts
operator:
## Are you using Prometheus Operator? [Blog Post](https://coreos.com/blog/the-prometheus-operator.html)
enabled: true
## Configures Alerts, which will be setup via Prometheus Operator / ConfigMaps.
alerts:
## Prometheus exporter must be enabled as well
enabled: true
## Selector must be configured to match Prometheus Install, defaulting to whats done by Prometheus Operator
## See [CoreOS Prometheus Chart](https://github.com/coreos/prometheus-operator/tree/master/helm)
selector:
role: alert-rules
labels: {}
serviceMonitor:
## Interval at which Prometheus scrapes RabbitMQ Exporter
interval: 10s
# Namespace Prometheus is installed in
namespace: monitoring
## Defaults to whats used if you follow CoreOS [Prometheus Install Instructions](https://github.com/coreos/prometheus-operator/tree/master/helm#tldr)
## [Prometheus Selector Label](https://github.com/coreos/prometheus-operator/blob/master/helm/prometheus/templates/prometheus.yaml#L65)
## [Kube Prometheus Selector Label](https://github.com/coreos/prometheus-operator/blob/master/helm/kube-prometheus/values.yaml#L298)
selector:
prometheus: kube-prometheus
## Kubernetes Cluster Domain
clusterDomain: cluster.local
Please don't paste the entire values.yaml, paste what you changed from default. helm get values $HELM_RELEASE will show you. But based upon above, you have not set rabbitmqErlangCookie as I said before. Set it.
You mean in here :
'''
'''
Today I received this error after updating the chart
The secret rabbitmq-ha.rabbitmq-erlang-cookie was updated during the update.
If one cluster node is restarted, it will use the new secret, and the other nodes will use the old one.
To use the RabbitMQ chart, you still have to have familiarity with how to administer RabbitMQ clusters. Read the clustering guide about how to use the cookie.
You can set 2 things to fix it :
Set the erlang cookie and mirror policy in the config map
@RotemEmergi Sounds like your problem is resolved. Can you close the ticket if so?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
Most helpful comment
You can set 2 things to fix it :
Set the erlang cookie and mirror policy in the config map