Is this a request for help?:
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG
Version of Helm and Kubernetes:
k8s version:
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
helm version:
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
linux version:
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Which chart:
stable/rabbitmq-ha
What happened:
The rabbitmq-ha pods always crashloopbackoff
root@top-unknown140407:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-rabbitmq-ha-0 1/1 Running 0 25m
rabbitmq-rabbitmq-ha-1 0/1 CrashLoopBackOff 7 25m
rabbitmq-rabbitmq-ha-2 0/1 CrashLoopBackOff 7 25m
The pods log is below:
root@top-unknown140407:~# kubectl logs rabbitmq-rabbitmq-ha-2
2018-08-17 03:09:07.060 [info] <0.33.0> Application lager started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:07.880 [info] <0.33.0> Application recon started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:07.880 [info] <0.33.0> Application xmerl started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:07.880 [info] <0.33.0> Application amqp10_common started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:07.881 [info] <0.33.0> Application crypto started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:07.881 [info] <0.33.0> Application cowlib started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.079 [info] <0.33.0> Application mnesia started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.089 [info] <0.33.0> Application os_mon started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.089 [info] <0.33.0> Application jsx started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.180 [info] <0.33.0> Application inets started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.180 [info] <0.33.0> Application asn1 started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.180 [info] <0.33.0> Application public_key started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.260 [info] <0.33.0> Application ssl started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.264 [info] <0.33.0> Application amqp10_client started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.272 [info] <0.33.0> Application ranch started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.272 [info] <0.33.0> Application ranch_proxy_protocol started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.273 [info] <0.33.0> Application rabbit_common started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.284 [info] <0.33.0> Application amqp_client started on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
2018-08-17 03:09:08.297 [info] <0.201.0>
Starting RabbitMQ 3.7.7 on Erlang 20.3.4
Copyright (C) 2007-2018 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
## ##
## ## RabbitMQ 3.7.7. Copyright (C) 2007-2018 Pivotal Software, Inc.
########## Licensed under the MPL. See http://www.rabbitmq.com/
###### ##
########## Logs: <stdout>
Starting broker...
2018-08-17 03:09:08.322 [info] <0.201.0>
node : rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : GpwYGB2NDXdwjhX6b7NeCQ==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
2018-08-17 03:09:10.806 [info] <0.209.0> Memory high watermark set to 244 MiB (256000000 bytes) of 7983 MiB (8370999296 bytes) total
2018-08-17 03:09:10.811 [info] <0.211.0> Enabling free disk space monitoring
2018-08-17 03:09:10.811 [info] <0.211.0> Disk free limit set to 50MB
2018-08-17 03:09:10.814 [info] <0.213.0> Limiting to approx 1048476 file handles (943626 sockets)
2018-08-17 03:09:10.814 [info] <0.214.0> FHC read buffering: OFF
2018-08-17 03:09:10.814 [info] <0.214.0> FHC write buffering: ON
2018-08-17 03:09:10.815 [info] <0.201.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2018-08-17 03:09:10.815 [info] <0.201.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2018-08-17 03:09:10.816 [info] <0.201.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2018-08-17 03:09:10.816 [info] <0.201.0> Peer discovery backend does not support locking, falling back to randomized delay
2018-08-17 03:09:10.816 [info] <0.201.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2018-08-17 03:09:10.898 [info] <0.201.0> k8s endpoint listing returned nodes not yet ready: rabbitmq-rabbitmq-ha-1, rabbitmq-rabbitmq-ha-2
2018-08-17 03:09:10.898 [info] <0.201.0> All discovered existing cluster peers: rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
2018-08-17 03:09:10.898 [info] <0.201.0> Peer nodes we can cluster with: rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
2018-08-17 03:09:10.910 [info] <0.201.0> Node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' selected for auto-clustering
2018-08-17 03:10:20.092 [error] <0.223.0> ** Node 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local' not responding **
** Removing (timedout) connection **
2018-08-17 03:10:20.106 [error] <0.142.0> Mnesia('rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'}
2018-08-17 03:10:20.395 [error] <0.200.0> CRASH REPORT Process <0.200.0> with 0 neighbours exited with reason: {{failed_to_cluster_with,['rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'],"Mnesia could not connect to any nodes."},{rabbit,start,[normal,[]]}} in application_master:init/4 line 134
2018-08-17 03:10:20.395 [info] <0.33.0> Application rabbit exited with reason: {{failed_to_cluster_with,['rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'],"Mnesia could not connect to any nodes."},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{failed_to_cluster_with,['rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'],\"Mnesia could not connect to any nodes.\"},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{failed_to_cluster_with,['rabbit@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'],"M
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
The pods event is below:
root@top-unknown140407:~# kubectl describe pod/rabbitmq-rabbitmq-ha-2
Name: rabbitmq-rabbitmq-ha-2
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: top-unknown140411.i.nease.net/10.202.15.14
Start Time: Fri, 17 Aug 2018 09:40:42 +0800
Labels: app=rabbitmq-ha
controller-revision-hash=rabbitmq-rabbitmq-ha-7544fb9cd
release=rabbitmq
statefulset.kubernetes.io/pod-name=rabbitmq-rabbitmq-ha-2
Annotations: checksum/config=eec3deb4422de4a05d6e1aa58328d6bc5fd434cd4c07b45afb04ed4d8664a357
cni.projectcalico.org/podIP=192.168.4.12/32
Status: Running
IP: 192.168.4.12
Controlled By: StatefulSet/rabbitmq-rabbitmq-ha
Init Containers:
copy-rabbitmq-config:
Container ID: docker://f7b20e5994442d0f659799782b651a19ed7e606a19b7f357fe6f2e0853c59706
Image: busybox
Image ID: docker-pullable://busybox@sha256:cb63aa0641a885f54de20f61d152187419e8f6b159ed11a251a09d115fdff9bd
Port: <none>
Host Port: <none>
Command:
sh
-c
cp /configmap/* /etc/rabbitmq; rm -f /var/lib/rabbitmq/.erlang.cookie
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 17 Aug 2018 09:40:48 +0800
Finished: Fri, 17 Aug 2018 09:40:48 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/configmap from configmap (rw)
/etc/rabbitmq from config (rw)
/var/lib/rabbitmq from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-rabbitmq-ha-token-4wj7v (ro)
Containers:
rabbitmq-ha:
Container ID: docker://ab80d19f29fce7543ca91ceef1c37739cc7654fe19ee334f2bfe415ddb039a70
Image: rabbitmq:3.7-alpine
Image ID: docker-pullable://rabbitmq@sha256:a1fb74e73e6873cb56e66fae6ef3b88489f44e47dc5338f0d6ae2c0c3cac9894
Ports: 4369/TCP, 5672/TCP, 15672/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 17 Aug 2018 11:09:01 +0800
Finished: Fri, 17 Aug 2018 11:10:22 +0800
Ready: False
Restart Count: 17
Liveness: exec [rabbitmqctl status] delay=120s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [rabbitmqctl status] delay=10s timeout=3s period=5s #success=1 #failure=3
Environment:
MY_POD_NAME: rabbitmq-rabbitmq-ha-2 (v1:metadata.name)
RABBITMQ_USE_LONGNAME: true
RABBITMQ_NODENAME: rabbit@$(MY_POD_NAME).rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
K8S_HOSTNAME_SUFFIX: .rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
K8S_SERVICE_NAME: rabbitmq-rabbitmq-ha-discovery
RABBITMQ_ERLANG_COOKIE: <set to the key 'rabbitmq-erlang-cookie' in secret 'rabbitmq-rabbitmq-ha'> Optional: false
RABBITMQ_DEFAULT_USER: airlab
RABBITMQ_DEFAULT_PASS: <set to the key 'rabbitmq-password' in secret 'rabbitmq-rabbitmq-ha'> Optional: false
RABBITMQ_DEFAULT_VHOST: /airlab_vhost
Mounts:
/etc/rabbitmq from config (rw)
/var/lib/rabbitmq from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-rabbitmq-ha-token-4wj7v (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
configmap:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-rabbitmq-ha
Optional: false
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
rabbitmq-rabbitmq-ha-token-4wj7v:
Type: Secret (a volume populated by a Secret)
SecretName: rabbitmq-rabbitmq-ha-token-4wj7v
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 1h kubelet, top-unknown140411.i.nease.net Readiness probe failed: Status of node rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local ...
Error: unable to perform an operation on node 'rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on http://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local']
rabbit@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local:
* connected to epmd (port 4369) on rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli90@rabbitmq-rabbitmq-ha-2.rabbitmq-rabbitmq-ha-discovery.default.svc.cluster.local'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: GpwYGB2NDXdwjhX6b7NeCQ==
Warning Unhealthy 40m kubelet, top-unknown140411.i.nease.net Readiness probe failed:
Warning BackOff 1m (x293 over 1h) kubelet, top-unknown140411.i.nease.net Back-off restarting failed container
What you expected to happen:
Start rabbitmq-ha successfully
Anything else we need to know:
The default DNS is coredns, not kube dns
root@top-unknown140407:# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-4c8wg 2/2 Running 0 1d
calico-node-7zm6b 2/2 Running 0 23h
calico-node-cvm7k 2/2 Running 0 20h
calico-node-df4qz 2/2 Running 0 20h
calico-node-fbvpv 2/2 Running 0 20h
coredns-78fcdf6894-g7bql 1/1 Running 0 1d
coredns-78fcdf6894-j7gc6 1/1 Running 0 1d
etcd-top-unknown140407.i.nease.net 1/1 Running 0 1d
kube-apiserver-top-unknown140407.i.nease.net 1/1 Running 0 1d
kube-controller-manager-top-unknown140407.i.nease.net 1/1 Running 1 1d
kube-proxy-f79tf 1/1 Running 0 20h
kube-proxy-gtd2r 1/1 Running 0 20h
kube-proxy-qxpj8 1/1 Running 0 23h
kube-proxy-rktrf 1/1 Running 0 1d
kube-proxy-zs48w 1/1 Running 0 20h
kube-scheduler-top-unknown140407.i.nease.net 1/1 Running 1 1d
tiller-deploy-597c48f967-pqcj7 1/1 Running 0 22h
When I change the default DNS to kube dns, It worked well. It seems that the stable/rabbitmq-ha can't support coredns? Since k8s 1.11, CoreDNS has graduated to General Availability (GA) and is installed by default. https://kubernetes.io/docs/tasks/administer-cluster/coredns/
root@top-unknown140689:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default rabbitmq-rabbitmq-ha-0 1/1 Running 0 29m
default rabbitmq-rabbitmq-ha-1 1/1 Running 0 27m
default rabbitmq-rabbitmq-ha-2 1/1 Running 0 26m
kube-system calico-node-x6ct6 2/2 Running 0 52m
kube-system calico-node-xpqjt 2/2 Running 0 52m
kube-system etcd-top-unknown140689.i.nease.net 1/1 Running 0 50m
kube-system kube-apiserver-top-unknown140689.i.nease.net 1/1 Running 0 50m
kube-system kube-controller-manager-top-unknown140689.i.nease.net 1/1 Running 0 50m
kube-system kube-dns-86c47599bd-cnxbm 3/3 Running 0 58m
kube-system kube-proxy-8t2c4 1/1 Running 0 58m
kube-system kube-proxy-k8znl 1/1 Running 0 54m
kube-system kube-scheduler-top-unknown140689.i.nease.net 1/1 Running 0 50m
kube-system tiller-deploy-759cb9df9-xmpgf 1/1 Running 0 29m
I'm running rabbitmq-ha chart with CoreDNS just fine. Maybe something misconfigured with your coredns configmap.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
In my case it was crashing because of low memory (exit code 137). I was assigning too few resources (128Mi) had to increase to 256Mi.
When I change the default DNS to kube dns, It worked well. It seems that the stable/rabbitmq-ha can't support coredns? Since k8s 1.11, CoreDNS has graduated to General Availability (GA) and is installed by default. https://kubernetes.io/docs/tasks/administer-cluster/coredns/
root@top-unknown140689:~# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default rabbitmq-rabbitmq-ha-0 1/1 Running 0 29m default rabbitmq-rabbitmq-ha-1 1/1 Running 0 27m default rabbitmq-rabbitmq-ha-2 1/1 Running 0 26m kube-system calico-node-x6ct6 2/2 Running 0 52m kube-system calico-node-xpqjt 2/2 Running 0 52m kube-system etcd-top-unknown140689.i.nease.net 1/1 Running 0 50m kube-system kube-apiserver-top-unknown140689.i.nease.net 1/1 Running 0 50m kube-system kube-controller-manager-top-unknown140689.i.nease.net 1/1 Running 0 50m kube-system kube-dns-86c47599bd-cnxbm 3/3 Running 0 58m kube-system kube-proxy-8t2c4 1/1 Running 0 58m kube-system kube-proxy-k8znl 1/1 Running 0 54m kube-system kube-scheduler-top-unknown140689.i.nease.net 1/1 Running 0 50m kube-system tiller-deploy-759cb9df9-xmpgf 1/1 Running 0 29m
I have the same question锛孖'm coredns too
Most helpful comment
When I change the default DNS to kube dns, It worked well. It seems that the stable/rabbitmq-ha can't support coredns? Since k8s 1.11, CoreDNS has graduated to General Availability (GA) and is installed by default. https://kubernetes.io/docs/tasks/administer-cluster/coredns/