K3s: Pods stuck in ContainerCreating in v0.4.0-rc1

Created on 10 Apr 2019 · 7Comments · Source: k3s-io/k3s

Describe the bug

When starting v0.4.0-rc1 (b5217e2) with;

k3s server --no-deploy traefik

everything seems to go well but the coredns pod is stuck in;

# kubectl get pods --all-namespaces
NAMESPACE     NAME                       READY   STATUS              RESTARTS   AGE
kube-system   coredns-857cdbd8b4-hrwfs   0/1     ContainerCreating   0          5m56s

The pause image is loaded but not coredns;

# k3s crictl images
IMAGE               TAG                 IMAGE ID            SIZE
k8s.gcr.io/pause    3.1                 da86e6ba6ca19       317kB

I monitor all DNS queries from the system and the only thing that shows up is;

127.0.0.1:56793 - [10/Apr/2019:08:25:02 +0200] 30568 "A IN k8s.gcr.io. udp 28 false 512" NOERROR qr,rd,ra 347 0.02104945s
127.0.0.1:44742 - [10/Apr/2019:08:25:02 +0200] 54350 "AAAA IN k8s.gcr.io. udp 28 false 512" NOERROR qr,rd,ra 148 0.022812482s
127.0.0.1:41609 - [10/Apr/2019:08:25:04 +0200] 47261 "AAAA IN storage.googleapis.com. udp 40 false 512" NOERROR qr,rd,ra 166 0.017376335s
127.0.0.1:59117 - [10/Apr/2019:08:25:04 +0200] 13911 "A IN storage.googleapis.com. udp 40 false 512" NOERROR qr,rd,ra 154 0.017545402s

So no attempt is even made to load the coredns image.

I suspect something with containerd, here is tail of /var/lib/rancher/k3s/agent/containerd/containerd.log;

time="2019-04-10T06:38:43.280172448Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:coredns-857cdbd8b4-hrwfs,Uid:5dd14879-5b59-11e9-9ae
0-000000010001,Namespace:kube-system,Attempt:0,}"
time="2019-04-10T06:38:43.336230176Z" level=info msg="shim containerd-shim started" address=/containerd-shim/k8s.io/69a3cf420fae8bf8c1c8b47805815d05
d98ef09fb488bedb4c0f333112361a2d/shim.sock debug=false pid=11373
time="2019-04-10T06:38:43.456785677Z" level=info msg="shim reaped" id=69a3cf420fae8bf8c1c8b47805815d05d98ef09fb488bedb4c0f333112361a2d
time="2019-04-10T06:38:43.496540778Z" level=info msg="TaskExit event &TaskExit{ContainerID:69a3cf420fae8bf8c1c8b47805815d05d98ef09fb488bedb4c0f33311
2361a2d,ID:69a3cf420fae8bf8c1c8b47805815d05d98ef09fb488bedb4c0f333112361a2d,Pid:11390,ExitStatus:137,ExitedAt:2019-04-10 06:38:43.432865535 +0000 UT
C,}"
time="2019-04-10T06:38:43.516682653Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:coredns-857cdbd8b4-hrwfs,Uid:5dd14879-5b59-11e9-9a
e0-000000010001,Namespace:kube-system,Attempt:0,} failed, error" error="failed to start sandbox container: failed to start sandbox container task \"
69a3cf420fae8bf8c1c8b47805815d05d98ef09fb488bedb4c0f333112361a2d\": cgroups: cgroup deleted: unknown"

To Reproduce

See above.

Expected behavior

Pods shall start

Screenshots
-

Additional context

I have re-build k3s locally from the "master" branch and I get exactly the same problem. If you have some patch you want me to try it is no problem.

Source

uablrek

👍2

Most helpful comment

Same here on alpine 3.9, is there any clear docs about how to setup the cgroups?

blang on 17 Apr 2019

👍3

All 7 comments

I don't run systemd. Is there any "cgroup" setup made by systemd that I may be missing?

uablrek on 10 Apr 2019

Yes it was. Starting with systemd and;

# k3s kubectl get pods --all-namespaces
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE
kube-system   coredns-857cdbd8b4-77f87   1/1     Running   0          16s

I am no big fan of "systemd" so I would really like to avoid it. I will try to find out what is missing, but any hint is appreciated.

uablrek on 10 Apr 2019

Solved.

I removed my old (faulty) cgroup mounts entirely and got;

WARN[2019-04-10T07:31:55.832577885Z] Failed to find cpuset cgroup, you may need to add "cgroup_enable=cpuset" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi) 
ERRO[2019-04-10T07:31:55.832601058Z] Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi) 
FATA[2019-04-10T07:31:55.832612668Z] failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)

Which is very clear information (thanks). Added;

mount -t tmpfs tmpfs /sys/fs/cgroup
for d in cpuset memory; do
  mkdir /sys/fs/cgroup/$d
  mount -t cgroup cgroup /sys/fs/cgroup/$d
done

I guess not many users build their own system but for those few this information is good to have, so please consider some document update.

I leave this issue open for now but feel free to close it at any time.

uablrek on 10 Apr 2019

I can't encounter this on arch and fedora so I'll assume this only happens if your cgroup mounts are incorrectl.

sr229 on 10 Apr 2019

Same here on alpine 3.9, is there any clear docs about how to setup the cgroups?

blang on 17 Apr 2019

👍3

Same here on alpine 3.9, is there any clear docs about how to setup the cgroups?

in alpine, simply follow the README and do exactly the same as above. It worked for me (k3s 0.7.1rc2, alpine 3.10)

vincentserpoul on 3 Jul 2019

Just struggled with the same issue. But found https://github.com/rancher/k3s/issues/660#issuecomment-514353060 shows that the suggested cgroup edits at https://rancher.com/docs/k3s/latest/en/advanced/#alpine-linux are "wrong".

Leave fstab and cgconfig.conf alone and just rc-update add cgroups default

AndyPook on 10 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Traefik Ingress should process IPv6 traffic

weber-software · 3Comments

Prevent CoreDNS from rewriting it's configmap on restart

VictorRobellini · 3Comments

K3s reapplies "master" node-role label to single node cluster on server restart.

jgreat · 3Comments

Installation Fails on Fedora 33 IOT Edition

seanmalloy · 3Comments

Bump github.com/google/cadvisor from v0.34.0 to v0.35.0

joakimr-axis · 3Comments