K3s: cannot run k3s server

Created on 27 Feb 2019  ยท  20Comments  ยท  Source: k3s-io/k3s

I just run k3s server. But got the error. "Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused""
So what I missed. Is that mean I should install the containerd first?

help wanted

Most helpful comment

I could solve the error by adding an entry to /etc/hosts that resolve the hostname of the machine that runs k3s.

All 20 comments

met the same problem while using the scripts on Ubuntu 16.04.5

    curl -sfL https://get.k3s.io | sh -
    # Check for Ready node, takes maybe 30 seconds
    k3s kubectl get node

16:20 root@k3ss:~ ->  k3s server &
[1] 1433
16:20 root@k3ss:~ ->  INFO[2019-02-27T16:20:54.129418734+08:00] Starting k3s v0.1.0 (91251aa)
INFO[2019-02-27T16:20:54.130185364+08:00] Running kube-apiserver --watch-cache=false --cert-dir /var/lib/rancher/k3s/server/tls/temporary-certs --allow-privileged=true --authorization-mode Node,RBAC --service-account-signing-key-file /var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range 10.43.0.0/16 --advertise-port 6445 --advertise-address 127.0.0.1 --insecure-port 0 --secure-port 6444 --bind-address 127.0.0.1 --tls-cert-file /var/lib/rancher/k3s/server/tls/localhost.crt --tls-private-key-file /var/lib/rancher/k3s/server/tls/localhost.key --service-account-key-file /var/lib/rancher/k3s/server/tls/service.key --service-account-issuer k3s --api-audiences unknown --basic-auth-file /var/lib/rancher/k3s/server/cred/passwd--kubelet-client-certificate /var/lib/rancher/k3s/server/tls/token-node.crt --kubelet-client-key /var/lib/rancher/k3s/server/tls/token-node.key
INFO[2019-02-27T16:20:54.244202281+08:00] Running kube-scheduler --kubeconfig /var/lib/rancher/k3s/server/cred/kubeconfig-system.yaml --port 0 --secure-port 0 --leader-elect=false
INFO[2019-02-27T16:20:54.245458530+08:00] Running kube-controller-manager --kubeconfig /var/lib/rancher/k3s/server/cred/kubeconfig-system.yaml --service-account-private-key-file /var/lib/rancher/k3s/server/tls/service.key --allocate-node-cidrs --cluster-cidr 10.42.0.0/16 --root-ca-file /var/lib/rancher/k3s/server/tls/token-ca.crt --port 0 --secure-port 0 --leader-elect=false
INFO[2019-02-27T16:20:54.441461048+08:00] Listening on :6443
INFO[2019-02-27T16:20:54.442529489+08:00] Writing manifest: /var/lib/rancher/k3s/server/manifests/coredns.yaml
INFO[2019-02-27T16:20:54.442833392+08:00] Writing manifest: /var/lib/rancher/k3s/server/manifests/traefik.yaml
INFO[2019-02-27T16:20:54.648478987+08:00] Node token is available at /var/lib/rancher/k3s/server/node-token
INFO[2019-02-27T16:20:54.648790424+08:00] To join node to cluster: k3s agent -s https://192.168.2.31:6443 -t ${NODE_TOKEN}
INFO[2019-02-27T16:20:54.717775321+08:00] Wrote kubeconfig /etc/rancher/k3s/k3s.yaml
INFO[2019-02-27T16:20:54.717981968+08:00] Run: k3s kubectl
INFO[2019-02-27T16:20:54.718092050+08:00] k3s is up and running
INFO[2019-02-27T16:20:54.751459222+08:00] Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log
INFO[2019-02-27T16:20:54.751825550+08:00] Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd
INFO[2019-02-27T16:20:54.764530689+08:00] Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure,latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connectionrefused"
containerd: exit status 1

[1]+  Exit 1                  k3s server

16:36 root@k3ss:~ ->  systemctl status k3s
โ— k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2019-02-27 16:36:50 CST; 4s ago
     Docs: https://k3s.io
  Process: 1340 ExecStart=/usr/local/bin/k3s server (code=exited, status=1/FAILURE)
  Process: 1338 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 1335 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
 Main PID: 1340 (code=exited, status=1/FAILURE)

Feb 27 16:36:50 k3ss k3s[1340]: time="2019-02-27T16:36:50.390778428+08:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Feb 27 16:36:50 k3ss k3s[1340]: time="2019-02-27T16:36:50.391199273+08:00" level=info msg="Run: k3s kubectl"
Feb 27 16:36:50 k3ss k3s[1340]: time="2019-02-27T16:36:50.391405115+08:00" level=info msg="k3s is up and running"
Feb 27 16:36:50 k3ss k3s[1340]: time="2019-02-27T16:36:50.423939072+08:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/container
Feb 27 16:36:50 k3ss k3s[1340]: time="2019-02-27T16:36:50.424437846+08:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/conta
Feb 27 16:36:50 k3ss k3s[1340]: time="2019-02-27T16:36:50.445521823+08:00" level=info msg="Waiting for containerd startup: rpc error: code = Unavaila
Feb 27 16:36:50 k3ss k3s[1340]: containerd: exit status 1
Feb 27 16:36:50 k3ss systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 16:36:50 k3ss systemd[1]: k3s.service: Unit entered failed state.
Feb 27 16:36:50 k3ss systemd[1]: k3s.service: Failed with result 'exit-code'.

Same issue on Ubuntu 18.04.2 LTS x86_64.

I could solve the error by adding an entry to /etc/hosts that resolve the hostname of the machine that runs k3s.

Can you try running with k3s --debug server to get more logs. Also can you share the contents of /var/lib/rancher/k3s/agent/containerd/containerd.log which has the containerd specific stuff. We will try on Ubuntu 16.04 but 18.04 is heavily tested so I'm thinking 18.04 is more specific to your (@Brian-Gaffney) setup.

I already have an entry in /etc/hosts for my machine name (127.0.1.1 media-center).

Log from running k3s --debug server: https://pastebin.com/5zLja0Uk

Contents of /var/lib/rancher/k3s/agent/containerd/containerd.log: https://pastebin.com/7WJakvz4

Here's containerd.log during a startup that gets the error

root@growth:/var/lib/rancher/k3s/agent/containerd# tail -f containerd.log 
time="2019-02-27T15:09:34.099340400-08:00" level=info msg="Start subscribing containerd event"
time="2019-02-27T15:09:34.099367986-08:00" level=info msg="Start recovering state"
time="2019-02-27T15:09:34.355949707-08:00" level=info msg="Start event monitor"
time="2019-02-27T15:09:34.355993801-08:00" level=info msg="Start snapshots syncer"
time="2019-02-27T15:09:34.356004086-08:00" level=info msg="Start streaming server"
time="2019-02-27T15:09:35.248697352-08:00" level=info msg="No cni config template is specified, wait for other system components to drop the config."
time="2019-02-27T15:11:05.943482050-08:00" level=info msg="Stop CRI service"
time="2019-02-27T15:11:05.943616948-08:00" level=info msg="Stop CRI service"
time="2019-02-27T15:11:05.943676991-08:00" level=info msg="Event monitor stopped"
time="2019-02-27T15:11:05.943708954-08:00" level=info msg="Stream server stopped"
time="2019-02-27T15:13:37.742432439-08:00" level=info msg="starting containerd" revision= version=1.2.3+unknown
time="2019-02-27T15:13:37.742642946-08:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2019-02-27T15:13:37.742680261-08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2019-02-27T15:13:37.742697530-08:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2019-02-27T15:13:37.742742635-08:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2019-02-27T15:13:37.742833876-08:00" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2019-02-27T15:13:37.742847238-08:00" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2019-02-27T15:13:37.742865676-08:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742874737-08:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742882481-08:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742892154-08:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742901402-08:00" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742910120-08:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742919227-08:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.742928358-08:00" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2019-02-27T15:13:37.753671287-08:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2019-02-27T15:13:37.753711002-08:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2019-02-27T15:13:37.753939752-08:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2019-02-27T15:13:37.754196531-08:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2019-02-27T15:13:37.754230037-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754241663-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754254132-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754265645-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754273952-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754302196-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754312831-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754320202-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754327416-08:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2019-02-27T15:13:37.754359429-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754369470-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754378071-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754385717-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.754444607-08:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntime:{Type:io.containerd.runtime.v1.linux Engine: Root: Options:<nil>} UntrustedWorkloadRuntime:{Type: Engine: Root: Options:<nil>} Runtimes:map[] NoPivot:false} CniConfig:{NetworkPluginBinDir:/var/lib/rancher/k3s/data/4df430e1473d0225734948e562863c82f20d658ed9c420c77e168aec42eccdb5/bin NetworkPluginConfDir:/var/lib/rancher/k3s/agent/etc/cni/net.d NetworkPluginConfTemplate:} Registry:{Mirrors:map[docker.io:{Endpoints:[https://registry-1.docker.io]}] Auths:map[]} StreamServerAddress:growth StreamServerPort:10010 EnableSelinux:false SandboxImage:k8s.gcr.io/pause:3.1 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384} ContainerdRootDir:/var/lib/rancher/k3s/agent/containerd ContainerdEndpoint:/run/k3s/containerd/containerd.sock RootDir:/var/lib/rancher/k3s/agent/containerd/io.containerd.grpc.v1.cri StateDir:/run/k3s/containerd/io.containerd.grpc.v1.cri}"
time="2019-02-27T15:13:37.754473326-08:00" level=info msg="Connect containerd service"
time="2019-02-27T15:13:37.754559837-08:00" level=info msg="Get image filesystem path \"/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs\""
time="2019-02-27T15:13:37.755108266-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2019-02-27T15:13:37.755255045-08:00" level=info msg=serving... address=/run/k3s/containerd/containerd.sock
time="2019-02-27T15:13:37.755270713-08:00" level=info msg="containerd successfully booted in 0.013295s"
time="2019-02-27T15:13:37.755516160-08:00" level=info msg="Start subscribing containerd event"
time="2019-02-27T15:13:37.755565818-08:00" level=info msg="Start recovering state"
time="2019-02-27T15:13:38.015723896-08:00" level=info msg="Start event monitor"
time="2019-02-27T15:13:38.015756057-08:00" level=info msg="Start snapshots syncer"
time="2019-02-27T15:13:38.015766765-08:00" level=info msg="Start streaming server"
time="2019-02-27T15:13:38.914524330-08:00" level=info msg="No cni config template is specified, wait for other system components to drop the config."

@Brian-Gaffney You have something that is listening on port 10250 that is conflicting. Can you see what it is. sudo netstat -anp | grep 10250 will help you find the PID.

@ibuildthecloud You nailed it.

I had the remnants of a minikube start vm-driver=none minikube install still kicking around.
I've removed that and k3s seems to be working now.

Cheers.

Same issues on RHEL7.

Resolved by modifying the /etc/hosts file with the servers ip and hostname.

Additionally had to stop/disable firewalld for the pods to start.

Same problem here using Antergos Linux.

In /var/lib/rancher/k3s/agent/containerd/containerd.log I see this error:

time="2019-03-02T08:29:51.846191974-03:00" level=fatal msg="Failed to run CRI service" error="stream server error: listen tcp: lookup francisco-pc on 8.8.8.8:53: no such host"

So editing /etc/hosts and adding the following line did the trick for me:

127.0.0.1       francisco-pc

I had the same issue on Ubuntu 16.04.5 LTS with Ansible on Vagrant/VirtualBox. Strangely sometimes it worked, other times with connect: connection refused error on containerd.sock.

But I found out that restarting the k3s service using systemctl restart k3s seems a workaround. Altough the error still contains in the logs after restart, it seems only temprorary. After waiting a few seconds, container got started as expected.

Currently I'm working around this with the following tasks:

  - name: Wait for containerd log
    become: yes
    wait_for:
      path: /var/lib/rancher/k3s/agent/containerd/containerd.log
  - name: Get containerd log
    shell: grep -i 'no cni' /var/lib/rancher/k3s/agent/containerd/containerd.log
    # Avoid ansibles error triggering when grep has no match and return 1: https://stackoverflow.com/a/41010653/3276634
    failed_when: "containerd_log.rc == 2"
    become: yes
    register: containerd_log

  - name: Restart k3s
    shell: systemctl restart k3s
    become: yes
    when: containerd_log.stdout != ""

@kaitoy It's cool. It works. But just on the server. When I join the node ,it's node working.
@ibuildthecloud Here is the log.
containerd.log

@Prodian0013 @franciscocpg Thanks. It's working when run the command "k3s server".But when I join the node,it's failed.Here is the agant log with containerd.
containerd.log

None of these solutions have worked for me on ArchLinux. I indeed have the same DNS lookup issue during containerd's startup:

time="2019-03-20T16:18:25.956608221-06:00" level=error msg="Failed to start streaming server" error="listen tcp: lookup catbox on 192.168.1.1:53: no such host"
time="2019-03-20T16:18:25.956684986-06:00" level=info msg="Stop CRI service"
time="2019-03-20T16:18:25.956739211-06:00" level=info msg="Event monitor stopped"
time="2019-03-20T16:18:25.956761676-06:00" level=info msg="Stream server stopped"
time="2019-03-20T16:18:25.956784492-06:00" level=fatal msg="Failed to run CRI service" error="stream server error: listen tcp: lookup catbox on 192.168.1.1:53: no such host"

I can do ping catbox and it does reply, albeit with the ipv6 address. I doubt that's the problem though. I could put a temp entry in my DNS server and see if that allows me to get past this but, this is a laptop on a dynamic IP, if I choose to work from anywhere that doesn't allow me to do that.... Point is, /etc/hosts should be good enough.

I do not have anything else running on port 10250.

I've tried installing this just as a download, and via the curl command (which does a few nice things) How do I go about troubleshooting this deeper?

For now, I'm running k3s server --docker to get around this.

Thanks to @kaitoy and @franciscocpg.
The error occured to me logged in /var/lib/rancher/k3s/agent/containerd/containerd.log is shown below:

msg="Failed to start streaming server" error="listen tcp: lookup app1 on [::1]:53: read udp [::1]:40954->[::1]:53: read: connection refused"

Although the hint is not no such host, I try to add 127.0.0.1 app1 to the /etc/hosts and it works.
Hope this helpful!

FATA[2019-05-08T14:42:54.921767227+09:00] :apiserver exited: failed to create listener: failed to listen on 127.0.0.1:6444: listen tcp 127.0.0.1:6444: bind: address already in use
Can anyone point out how to resolve this issue?

FATA[2019-05-08T14:42:54.921767227+09:00] :apiserver exited: failed to create listener: failed to listen on 127.0.0.1:6444: listen tcp 127.0.0.1:6444: bind: address already in use
Can anyone point out how to resolve this issue?

Something is already using port 6444. Figure out what that is and stop it to be able to start apiserver or move apiserver to use an unused port. On the linux I use "netstat -anp | grep 6444" would be what I would use but different linux flavors may be different.

I solved it by # apt purge docker.io which is installed by default on AWS Ubuntu 18.04 LTS x86_64

I had same error. Previuosly there was runned a k3s as a server, not as agent.

For fix this error:

  • stop a k3s service. (systemctl stop k3s)
  • run k3s agent (sudo k3s agent --server ${K3S_URL} --token ${K3S_TOKEN})

Linking #495 here since they share a similar log profile, and this ranks much higher. If your a weirdo like me who still has a FS that wasn't formatted in the last few years, you likely need to deal with it

Was this page helpful?
0 / 5 - 0 ratings