Describe the bug
The new version (v0.4.0) ran incorrectly in air-gap env, but the previous version (v0.3.0) ran normally.
To Reproduce
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network.target
[Service]
Type=notify
EnvironmentFile=/etc/systemd/system/k3s.service.env
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s server
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=infinity
[Install]
WantedBy=multi-user.target
โ k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-04-17 09:23:40 UTC; 33s ago
Docs: https://k3s.io
Process: 1080 ExecStart=/usr/local/bin/k3s server (code=exited, status=255)
Process: 1077 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 1072 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Main PID: 1080 (code=exited, status=255)
Apr 17 09:23:40 william k3s[1080]: github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc001000000, 0x0, 0x1, 0x0)
Apr 17 09:23:40 william k3s[1080]: /go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:846 +0x2ec
Apr 17 09:23:40 william k3s[1080]: github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).Execute(...)
Apr 17 09:23:40 william k3s[1080]: /go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:794
Apr 17 09:23:40 william k3s[1080]: github.com/rancher/k3s/pkg/daemons/agent.kubeProxy.func1(0xc001000000)
Apr 17 09:23:40 william k3s[1080]: /go/src/github.com/rancher/k3s/pkg/daemons/agent/agent.go:45 +0x30
Apr 17 09:23:40 william k3s[1080]: created by github.com/rancher/k3s/pkg/daemons/agent.kubeProxy
Apr 17 09:23:40 william k3s[1080]: /go/src/github.com/rancher/k3s/pkg/daemons/agent/agent.go:44 +0x2c6
Apr 17 09:23:40 william systemd[1]: k3s.service: Main process exited, code=exited, status=255/n/a
Apr 17 09:23:40 william systemd[1]: k3s.service: Failed with result 'exit-code'.
Expected behavior
k3s start normally.
Screenshots
N/A
Additional context
OS version: Fedora 29
journalctl logs:
Apr 17 09:52:17 william systemd[1]: Starting Lightweight Kubernetes...
Apr 17 09:52:17 william k3s[1317]: time="2019-04-17T09:52:17.307821704Z" level=info msg="Starting k3s v0.4.0 (7822549f)"
Apr 17 09:52:17 william k3s[1317]: time="2019-04-17T09:52:17.308747380Z" level=info msg="Running kube-apiserver --allow-privileged=true --authorization-mode=Node,RBAC --insecure-port=0 --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/token-node.crt --requestheader-allowed-names=kubernetes-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --advertise-address=127.0.0.1 --secure-port=6444 --tls-cert-file=/var/lib/rancher/k3s/server/tls/localhost.crt --service-account-issuer=k3s --basic-auth-file=/var/lib/rancher/k3s/server/cred/passwd --kubelet-client-key=/var/lib/rancher/k3s/server/tls/token-node.key --requestheader-extra-headers-prefix=X-Remote-Extra- --service-cluster-ip-range=10.43.0.0/16 --api-audiences=unknown --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-group-headers=X-Remote-Group --watch-cache=false --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --advertise-port=6445 --bind-address=127.0.0.1 --tls-private-key-file=/var/lib/rancher/k3s/server/tls/localhost.key --requestheader-username-headers=X-Remote-User"
Apr 17 09:52:17 william k3s[1317]: time="2019-04-17T09:52:17.418452269Z" level=info msg="Running kube-scheduler --secure-port=0 --kubeconfig=/var/lib/rancher/k3s/server/cred/kubeconfig-system.yaml --leader-elect=false --port=10251 --bind-address=127.0.0.1"
Apr 17 09:52:17 william k3s[1317]: time="2019-04-17T09:52:17.418952625Z" level=info msg="Running kube-controller-manager --cluster-cidr=10.42.0.0/16 --root-ca-file=/var/lib/rancher/k3s/server/tls/token-ca.crt --leader-elect=false --port=10252 --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/kubeconfig-system.yaml --allocate-node-cidrs=true --secure-port=0 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.key"
Apr 17 09:52:17 william k3s[1317]: time="2019-04-17T09:52:17.478196406Z" level=info msg="Listening on :6443"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.377725824Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/node-token"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.378502970Z" level=info msg="To join node to cluster: k3s agent -s https://192.168.1.13:6443 -t ${NODE_TOKEN}"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.384737529Z" level=info msg="Writing static file: /var/lib/rancher/k3s/server/static/charts/traefik-1.64.0.tgz"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.385427654Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/coredns.yaml"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.385757235Z" level=info msg="Writing manifest: /var/lib/rancher/k3s/server/manifests/traefik.yaml"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.452020586Z" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.452061710Z" level=info msg="Run: k3s kubectl"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.452071898Z" level=info msg="k3s is up and running"
Apr 17 09:52:18 william systemd[1]: Started Lightweight Kubernetes.
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.515472774Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.515752103Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Apr 17 09:52:18 william k3s[1317]: time="2019-04-17T09:52:18.517585172Z" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
Apr 17 09:52:23 william k3s[1317]: time="2019-04-17T09:52:23.491709952Z" level=info msg="Connecting to wss://localhost:6443/v1-k3s/connect"
Apr 17 09:52:23 william k3s[1317]: time="2019-04-17T09:52:23.491752118Z" level=info msg="Connecting to proxy" url="wss://localhost:6443/v1-k3s/connect"
Apr 17 09:52:23 william k3s[1317]: time="2019-04-17T09:52:23.497555552Z" level=info msg="Handling backend connection request [william]"
Apr 17 09:52:23 william k3s[1317]: time="2019-04-17T09:52:23.498939979Z" level=info msg="Running kubelet --root-dir=/var/lib/rancher/k3s/agent/kubelet --seccomp-profile-root=/var/lib/rancher/k3s/agent/kubelet/seccomp --resolv-conf=/etc/resolv.conf --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --read-only-port=0 --allow-privileged=true --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --authentication-token-webhook=true --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.pem --kubelet-cgroups=/systemd/system.slice --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --cni-bin-dir=/var/lib/rancher/k3s/data/e249e3fc5ccf8ea07732c672f3062154b47357f5c85e1735339f087752a0f3ee/bin --container-runtime=remote --address=127.0.0.1 --healthz-bind-address=127.0.0.1 --eviction-hard=imagefs.available<5%,nodefs.available<5% --fail-swap-on=false --cluster-domain=cluster.local --anonymous-auth=false --cert-dir=/var/lib/rancher/k3s/agent/kubelet/pki --cluster-dns=10.43.0.10 --runtime-cgroups=/systemd/system.slice --cgroup-driver=cgroupfs --kubeconfig=/var/lib/rancher/k3s/agent/kubeconfig.yaml --serialize-image-pulls=false --hostname-override=william"
Apr 17 09:52:23 william k3s[1317]: Flag --allow-privileged has been deprecated, will be removed in a future version
Apr 17 09:52:23 william k3s[1317]: time="2019-04-17T09:52:23.518017526Z" level=info msg="waiting for node william: nodes \"william\" not found"
Apr 17 09:52:23 william k3s[1317]: F0417 09:52:23.536112 1317 server.go:468] open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Apr 17 09:52:23 william k3s[1317]: goroutine 5322 [running]:
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/k8s.io/klog.stacks(0xc0006df000, 0xc000156280, 0x76, 0x13a)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/klog.go:828 +0xb1
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/k8s.io/klog.(*loggingT).output(0x5f7cb20, 0xc000000003, 0xc000c46460, 0x5caf918, 0x9, 0x1d4, 0x0)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/klog.go:779 +0x2d9
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/k8s.io/klog.(*loggingT).printDepth(0x5f7cb20, 0x3, 0x1, 0xc001791d90, 0x1, 0x1)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/klog.go:669 +0x12b
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/k8s.io/klog.(*loggingT).print(...)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/klog.go:660
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/k8s.io/klog.Fatal(...)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/klog.go:1189
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-proxy/app.NewProxyCommand.func1(0xc00180e780, 0xc00273eb40, 0x0, 0x4)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-proxy/app/server.go:468 +0x1e8
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).execute(0xc00180e780, 0xc0029da140, 0x4, 0x4, 0xc00180e780, 0xc0029da140)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:760 +0x2ae
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00180e780, 0xc003873950, 0x3bfb1c0, 0xc0033f3c80)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:846 +0x2ec
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).Execute(...)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:794
Apr 17 09:52:23 william k3s[1317]: github.com/rancher/k3s/pkg/daemons/agent.kubeProxy.func1(0xc00180e780)
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/pkg/daemons/agent/agent.go:45 +0x30
Apr 17 09:52:23 william k3s[1317]: created by github.com/rancher/k3s/pkg/daemons/agent.kubeProxy
Apr 17 09:52:23 william k3s[1317]: /go/src/github.com/rancher/k3s/pkg/daemons/agent/agent.go:44 +0x2c6
Apr 17 09:52:23 william systemd[1]: k3s.service: Main process exited, code=exited, status=255/n/a
Apr 17 09:52:23 william systemd[1]: k3s.service: Failed with result 'exit-code'.
It looks like the nf_conntrack modules is not loaded for whatever reason, if you add ExecStartPre=-/sbin/modprobe nf_conntrack to k3s.service does that help resolve the problem?
@erikwilson Yep, it's working after add "ExecStartPre=-/sbin/modprobe nf_conntrack" to k3s.service.
Which feature introduced the dependency of nf_conntrack module?
And I will submit a PR to add the dependencies of nf_conntrack modules. (install.sh, README.md, k3s.service)
/assign
It looks like the nf_conntrack modules is not loaded for whatever reason, if you add
ExecStartPre=-/sbin/modprobe nf_conntrackto k3s.service does that help resolve the problem?
Hi Eric
We've recently been having some similar intermittent issues which I believe are related to nf_conntrac module not being loaded.
Here's the log from journalctl for k3s:
k3s[15606]: W1107 12:52:03.156596 15606 proxier.go:597] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modu
Nov 07 12:52:03 svwrs60000 k3s[15606]: W1107 12:52:03.160629 15606 proxier.go:597] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modul
Nov 07 12:52:03 svwrs60000 k3s[15606]: W1107 12:52:03.165176 15606 proxier.go:597] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.208167 15606 node.go:135] Successfully retrieved node IP: 11.164.196.72
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.208222 15606 server_others.go:150] Using iptables Proxier.
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.216003 15606 server.go:529] Version: v1.16.2-k3s.1
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.216681 15606 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 2621440
Nov 07 12:52:03 svwrs60000 k3s[15606]: F1107 12:52:03.216715 15606 server.go:443] open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Nov 07 12:52:03 svwrs60000 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 07 12:52:03 svwrs60000 systemd[1]: Unit k3s.service entered failed state.
Nov 07 12:52:03 svwrs60000 systemd[1]: k3s.service failed.
You suggest adding "ExecStartPre=-/sbin/modprobe nf_conntrack" to k3s.service and following the change and I can see the following entries in the service file:
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
Is adding "ExecStartPre=-/sbin/modprobe nf_conntrack" still relevant to make sure the service is not considered failed?
Many thanks in advance,
Eldar
We have still this problem with v1.18.2+k3s1.
After multitple restarts (<20) by systemD it comes up. Unfortunately the containerd will already start the pods and our software stack will initializing but it breaks without a functional network.
Most helpful comment
Hi Eric
We've recently been having some similar intermittent issues which I believe are related to nf_conntrac module not being loaded.
Here's the log from journalctl for k3s:
k3s[15606]: W1107 12:52:03.156596 15606 proxier.go:597] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modu
Nov 07 12:52:03 svwrs60000 k3s[15606]: W1107 12:52:03.160629 15606 proxier.go:597] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modul
Nov 07 12:52:03 svwrs60000 k3s[15606]: W1107 12:52:03.165176 15606 proxier.go:597] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.208167 15606 node.go:135] Successfully retrieved node IP: 11.164.196.72
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.208222 15606 server_others.go:150] Using iptables Proxier.
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.216003 15606 server.go:529] Version: v1.16.2-k3s.1
Nov 07 12:52:03 svwrs60000 k3s[15606]: I1107 12:52:03.216681 15606 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 2621440
Nov 07 12:52:03 svwrs60000 k3s[15606]: F1107 12:52:03.216715 15606 server.go:443] open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
Nov 07 12:52:03 svwrs60000 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 07 12:52:03 svwrs60000 systemd[1]: Unit k3s.service entered failed state.
Nov 07 12:52:03 svwrs60000 systemd[1]: k3s.service failed.
You suggest adding "ExecStartPre=-/sbin/modprobe nf_conntrack" to k3s.service and following the change and I can see the following entries in the service file:
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
Is adding "ExecStartPre=-/sbin/modprobe nf_conntrack" still relevant to make sure the service is not considered failed?
Many thanks in advance,
Eldar