Microk8s: Can't get rabbitmq helm chart to run on microk8s on an Azure VM

Created on 19 Oct 2018 · 4Comments · Source: ubuntu/microk8s

I'm really looking for some help here regarding a networking issue that seems to happen with microk8s ONLY on Azure VMs (I tried Ubuntu 16.10 and 18.04)

I'm trying to get the stable/rabbitmq helm chart running, but keep getting a "failed_connect kubernetes.default.svc.cluster.local" error.

It works like a charm on a cheap cloud provider on Ubuntu 16.10 and in a VirtualBox VM running 18.04

Steps to reproduce on a brand new provisioned Azure VM (D2, 2CPUs, 8GB of RAM)

# install microk8s
sudo snap install microk8s --classic

#use docker from microk8s
sudo snap alias microk8s.docker docker

#use kubectl from microk8s
sudo snap alias microk8s.kubectl kubectl

# enable tools for the cluster
microk8s.enable dns ingress

# check the status of ufw
sudo ufw status => disabled

# not taking any chance, I'm setting the ufw/iptables rules in the troubleshooting guid
#avoid dns crashlooping
sudo ufw allow in on cbr0 && sudo ufw allow out on cbr0

#make sure iptables allow outbound traffic
sudo iptables -P FORWARD ACCEPT

# make sure the kube-dns pod can reach google from the kube-dns pod (required since kube-dns uses google dns servers)

kubectl exec -it -n kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name | cut -d '/' -f 2) -- nslookup google.com 127.0.0.1

Defaulting container name to kubedns.
Use 'kubectl describe pod/kube-dns-67b548dcff-frns8 -n kube-system' to see all of the containers in this pod.
Server:    127.0.0.1
Address 1: 127.0.0.1 localhost

Name:      google.com
Address 1: 172.217.15.110
Address 2: 2607:f8b0:4004:811::200e

# try to reach "kubernetes.default.svc.cluster.local" from the kube-dns pod (it works...)

kubectl exec -it -n kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name | cut -d '/' -f 2) -- nslookup kubernetes.default.svc.cluster.local 127.0.0.1

Defaulting container name to kubedns.
Use 'kubectl describe pod/kube-dns-67b548dcff-frns8 -n kube-system' to see all of the containers in this pod.
Server:    127.0.0.1
Address 1: 127.0.0.1 localhost

Name:      kubernetes.default.svc.cluster.local
Address 1: 10.152.183.1 kubernetes.default.svc.cluster.local

#install helm 
sudo snap install helm --classic

#init helm tiller pod
helm init --upgrade

# ulimitNoFiles to avoid having to configure the VM's ulimit size
helm install --name rabbit stable/rabbitmq --set rabbitmq.ulimitNofiles=1024

# check the logs
kubectl logs -f rabbit-rabbitmq-0

Here is the full log:

  ##  ##
  ##  ##      RabbitMQ 3.7.8. Copyright (C) 2007-2018 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See http://www.rabbitmq.com/
  ######  ##
  ##########  Logs: /opt/bitnami/rabbitmq/var/log/rabbitmq/[email protected]
                    /opt/bitnami/rabbitmq/var/log/rabbitmq/[email protected]_upgrade.log

              Starting broker...
2018-10-19 13:13:57.948 [info] <0.7.0> Log file opened with Lager
2018-10-19 13:13:57.963 [info] <0.7.0> Log file opened with Lager
2018-10-19 13:13:58.911 [info] <0.219.0>
 Starting RabbitMQ 3.7.8 on Erlang 21.1
 Copyright (C) 2007-2018 Pivotal Software, Inc.
 Licensed under the MPL.  See http://www.rabbitmq.com/
2018-10-19 13:13:58.916 [info] <0.219.0>
 node           : [email protected]
 home dir       : /opt/bitnami/rabbitmq/.rabbitmq
 config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
 cookie hash    : 293IJx7h3gVtWRAoKSIm+w==
 log(s)         : /opt/bitnami/rabbitmq/var/log/rabbitmq/[email protected]
                : /opt/bitnami/rabbitmq/var/log/rabbitmq/[email protected]_upgrade.log
 database dir   : /opt/bitnami/rabbitmq/var/lib/rabbitmq/mnesia/[email protected]
2018-10-19 13:14:02.150 [info] <0.227.0> Memory high watermark set to 3191 MiB (3346164940 bytes) of 7977 MiB (8365412352 bytes) total
2018-10-19 13:14:02.156 [info] <0.229.0> Enabling free disk space monitoring
2018-10-19 13:14:02.156 [info] <0.229.0> Disk free limit set to 50MB
2018-10-19 13:14:02.161 [info] <0.232.0> Limiting to approx 924 file handles (829 sockets)
2018-10-19 13:14:02.161 [info] <0.233.0> FHC read buffering:  OFF
2018-10-19 13:14:02.161 [info] <0.233.0> FHC write buffering: ON
2018-10-19 13:14:02.162 [info] <0.219.0> Node database directory at /opt/bitnami/rabbitmq/var/lib/rabbitmq/mnesia/[email protected] is empty. Assuming we need to join an existing cluster or initialise from scratch...
2018-10-19 13:14:02.163 [info] <0.219.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2018-10-19 13:14:02.163 [info] <0.219.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2018-10-19 13:14:02.163 [info] <0.219.0> Peer discovery backend does not support locking, falling back to randomized delay
2018-10-19 13:14:02.163 [info] <0.219.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2018-10-19 13:14:10.165 [info] <0.219.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2018-10-19 13:14:10.168 [error] <0.218.0> CRASH REPORT Process <0.218.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 138
2018-10-19 13:14:10.170 [info] <0.42.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164
2018-10-19 13:14:10.180 [error] <0.116.0> Failed to write log message to file /opt/bitnami/rabbitmq/var/log/rabbitmq/[email protected]: invalid argument
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n                 {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,816}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

Crash dump is being written to: /opt/bitnami/rabbitmq/var/log/rabbitmq/erl_crash.dump...done

PS: I can't attach the microk8s.inspect tarball because it's 34Mb, but I can forward it on demand

Thanks for you help and great tool!

Source

cnadeau

👍3

Most helpful comment

Ok, I just confirmed that I ran into the "race condition bug (issue 133)" which is mostly "sudo snap stop microk8s" did not actually stopped it.

Making sure it's down and waiting for the restart did get the pod up and running !

# check the # of restarts of each health pods
kubectl get --all-namespaces pods

sudo snap stop microk8s

# make sure microk8s is down
ps -ef | grep microk8s

# make sure kubelet is down
ps -ef | grep kubelet

# re-check the # of restarts of each health pods to make sure it incremented
kubectl get --all-namespaces pods

# if any of the above is still up or not restarted, run "sudo snap stop microk8s" again

# once validated
sudo snap start microk8s

Thanks a lot for your time, incredible help and great product, keep up the good work!

Cheers

cnadeau on 22 Oct 2018

👍2

All 4 comments

Hi @cnadeau ,

Thank you for using microk8s,

RabbitMQ seems to be failing to reach kubernetes service. I am not sure right now why but what you can do is to change the proxymode of kubeproxy. Here is how to do that.

Edit /var/snap/microk8s/current/args/kube-proxy and remove the line --proxy-mode="userspace".

Then restart the microk8s daemons with

sudo snap stop microk8s
sudo snap start microk8s

The containers will go into an Error state and be restarted with the new proxy configuration.

Thank you for taking the time to report this issue here.

ktsakalozos on 19 Oct 2018

👍1

Hi @ktsakalozos, thanks a lot for the answer, but sadly, it didn't got it back up. I even tried microk8s.reset, then reinstalling microk8s completely making sure to change that setting and restarting microk8s before testing, still no luck.

Could you explain a bit more what was the goal of removing the proxy-mode? Getting the default value which 'should' be iptables?

From the doc:

'Which proxy mode to use: 'userspace' (older) or 'iptables' (faster) or 'ipvs' (experimental). If blank, use the best-available proxy (currently iptables). If the iptables proxy is selected, regardless of how, but the system's kernel or iptables versions are insufficient, this always falls back to the userspace proxy.'

Any other logs I could provide to help figuring out what messes up the kubelet?

Again, thanks a lot for your answer.

cnadeau on 22 Oct 2018

@cnadeau, I went ahead to record a session to show you the fix:

https://asciinema.org/a/rmpueIlnb1TM8oUK42fXifRQE

There was not rehearsal involved so it might be hairy.

I am considering going with the default proxy-mode and removing it from the kube-proxy configuration but I need to do some tests before doing that. Let me know if you still getting this error.

Thanks

ktsakalozos on 22 Oct 2018

👍1

Ok, I just confirmed that I ran into the "race condition bug (issue 133)" which is mostly "sudo snap stop microk8s" did not actually stopped it.

Making sure it's down and waiting for the restart did get the pod up and running !

# check the # of restarts of each health pods
kubectl get --all-namespaces pods

sudo snap stop microk8s

# make sure microk8s is down
ps -ef | grep microk8s

# make sure kubelet is down
ps -ef | grep kubelet

# re-check the # of restarts of each health pods to make sure it incremented
kubectl get --all-namespaces pods

# if any of the above is still up or not restarted, run "sudo snap stop microk8s" again

# once validated
sudo snap start microk8s

Thanks a lot for your time, incredible help and great product, keep up the good work!

Cheers

cnadeau on 22 Oct 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings