Version:
k3s version v1.17.4+k3s1 (3eee8ac3)
K3s arguments:
export INSTALL_K3S_EXEC="--node-external-ip 10.127.0.1"
export INSTALL_K3S_NAME="master-LOCCH"
k3s-install.sh
Describe the bug
On both server and agents we've found that stopping the service results in exit-code/failure and one or more processes belonging to the service remain of the form:
```ps -efly | grep ranch
... containerd-shim-runc-v2 -namespace k8s.io ...
We've noticed this whilst trying to solve network external IP address PENDING and related issues.
**To Reproduce**
1. reboot server
2. stop service
**Additional context / logs**
Not clear where to find appropriate logs. containerd.log doesn't have anything related to the time-period when the 'stop' is issued.
```systemctl status k3s-master-LOCCH
โ k3s-master-LOCCH.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s-master-LOCCH.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-03-27 11:19:07 GMT; 31min ago
Docs: https://k3s.io
Process: 810 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 828 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 829 ExecStart=/usr/local/bin/k3s server --node-external-ip 10.127.0.1 (code=exited, status=1/FAILURE)
Main PID: 829 (code=exited, status=1/FAILURE)
Tasks: 102
Memory: 175.1M
CGroup: /system.slice/k3s-master-LOCCH.service
โโ1562 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ1617 /pause
โโ1816 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ1819 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ1889 /pause
โโ1901 /pause
โโ1974 /coredns -conf /etc/coredns/Corefile
โโ1981 /metrics-server
โโ2472 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ2512 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ2542 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ2546 /pause
โโ2558 /pause
โโ2619 /pause
โโ2672 /var/lib/rancher/k3s/data/6a3098e6644f5f0dbfe14e5efa99bb8fdf60d63cae89fdffd71b7de11a1f1430/bin/containerd-shim-runc-v2 -names>
โโ2699 /traefik --configfile=/config/traefik.toml
โโ2701 /opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
โโ2733 /pause
โโ2783 /bin/sh /usr/bin/entry
โโ2789 /opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
โโ2790 /opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
โโ2791 /opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
โโ2792 /opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
โโ2793 /opt/bitnami/apache/bin/httpd -f /opt/bitnami/apache/conf/httpd.conf -D FOREGROUND
โโ2826 /bin/sh /usr/bin/entry
โโ2872 /bin/sh -c node srv.js
โโ2888 node srv.js
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763091751Z" level=info msg="Shutting down /v1, Kind=Endpoints workers"
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763152451Z" level=info msg="Shutting down /v1, Kind=Pod workers"
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763201351Z" level=info msg="Shutting down /v1, Kind=Service workers"
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763247552Z" level=info msg="Shutting down /v1, Kind=Node workers"
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763294452Z" level=info msg="Shutting down batch/v1, Kind=Job workers"
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763338752Z" level=info msg="Shutting down helm.cattle.io/v1, Kind=HelmChart workers"
Mar 27 11:19:07 elloe01 k3s[829]: time="2020-03-27T11:19:07.763388352Z" level=fatal msg="controllers exited"
Mar 27 11:19:07 elloe01 systemd[1]: k3s-master-LOCCH.service: Main process exited, code=exited, status=1/FAILURE
On the agents:
root@innovation00:~# systemctl status k3s-agent-LOCCH
โ k3s-agent-LOCCH.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s-agent-LOCCH.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-03-27 10:54:39 UTC; 58min ago
Docs: https://k3s.io
Process: 27715 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 27717 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 27726 ExecStart=/usr/local/bin/k3s agent (code=exited, status=1/FAILURE)
Main PID: 27726 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 1.7G
CGroup: /system.slice/k3s-agent-LOCCH.service
Mar 27 10:50:00 innovation00 k3s[27726]: E0327 10:50:00.940429 27726 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to >
Mar 27 10:50:12 innovation00 k3s[27726]: time="2020-03-27T10:50:12.844926945Z" level=error msg="Failed to connect to proxy" error="dial tcp 10.12>
Mar 27 10:50:12 innovation00 k3s[27726]: time="2020-03-27T10:50:12.845011954Z" level=error msg="Remotedialer proxy error" error="dial tcp 10.127.>
Mar 27 10:50:17 innovation00 k3s[27726]: time="2020-03-27T10:50:17.845284699Z" level=info msg="Connecting to proxy" url="wss://10.127.0.1:6443/v1>
Mar 27 10:54:39 innovation00 k3s[27726]: time="2020-03-27T10:54:39.887071777Z" level=fatal msg="context canceled"
Mar 27 10:54:39 innovation00 k3s[27726]: I0327 10:54:39.887132 27726 network_policy_controller.go:172] Shutting down network policies controller
Mar 27 10:54:39 innovation00 systemd[1]: Stopping Lightweight Kubernetes...
Mar 27 10:54:39 innovation00 systemd[1]: k3s-agent-LOCCH.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 10:54:39 innovation00 systemd[1]: k3s-agent-LOCCH.service: Failed with result 'exit-code'.
Mar 27 10:54:39 innovation00 systemd[1]: Stopped Lightweight Kubernetes.
This is the expected behavior. It is needed if you want zero down-time upgrades (or close to). At the moment there is a k3s-killall.sh script which can be used to take down the service and containers, if that is desired.
For reference, the k3s-install.sh script is embedded inside install.sh.
The ArchLinux k3s-bin AUR package does not use the installer script, so this will get the script onto your system:
curl https://raw.githubusercontent.com/rancher/k3s/3c98290f0be546cdd12668d8f59cee66ca44c0a1/install.sh | awk '449<=NR && NR<=524' > k3s-killall.sh
chmod +x k3s-killall.sh
Most helpful comment
For reference, the
k3s-install.shscript is embedded insideinstall.sh.The ArchLinux k3s-bin AUR package does not use the installer script, so this will get the script onto your system: