Microk8s: failed to recover state: failed to reserve sandbox name

Created on 14 Jun 2019  路  13Comments  路  Source: ubuntu/microk8s

Hello!

Yesterday I had to hard reboot my machine after an "unusual" startup of my system. Since then microk8s seemed to stop working as it did all the time before.

microk8s.inspect tells me:

Inspecting services
 FAIL:  Service snap.microk8s.daemon-containerd is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-containerd
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-proxy is running
 FAIL:  Service snap.microk8s.daemon-kubelet is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-kubelet
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system info
  Copy network configuration to the final report tarball
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Inspect kubernetes cluster

When checking the logs for snap.microk8s.daemon-containerd it says:

Jun 14 21:02:29 nico-notebook-acer systemd[1]: Started Service for snap application microk8s.daemon-containerd.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-containerd[4832]: Using a default profile template
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-containerd[4832]: Reloading AppArmor profiles
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.002059679+02:00" level=info msg="starting containerd" revision=bb71b10fd8f58240ca47fbb579b9d1028eea7c84 version=v1.2.5
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.002282052+02:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.002303583+02:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.002559688+02:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.002572117+02:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004288896+02:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004317420+02:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004373348+02:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004623497+02:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004635970+02:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004648578+02:00" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004654682+02:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004740932+02:00" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004754452+02:00" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004798373+02:00" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004819425+02:00" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004828658+02:00" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004844319+02:00" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004854334+02:00" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004866624+02:00" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004878312+02:00" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004887543+02:00" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004958929+02:00" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.004995811+02:00" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005392978+02:00" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005418880+02:00" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005458810+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005475579+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005491001+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005504903+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005517467+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005530616+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005543661+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005556939+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.005571956+02:00" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007146330+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007258122+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007285781+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007299836+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.cri"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007443939+02:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntime:{Type:io.containerd.runtime.v1.linux Engine: Root: Options:<nil>} UntrustedWorkloadRuntime:{Type: Engine: Root: Options:<nil>} Runtimes:map[] NoPivot:false} CniConfig:{NetworkPluginBinDir:/snap/microk8s/608/opt/cni/bin NetworkPluginConfDir:/var/snap/microk8s/608/args/cni-network NetworkPluginConfTemplate:} Registry:{Mirrors:map[docker.io:{Endpoints:[https://registry-1.docker.io]} local.insecure-registry.io:{Endpoints:[http://localhost:32000]}] Auths:map[]} StreamServerAddress:127.0.0.1 StreamServerPort:0 EnableSelinux:false SandboxImage:k8s.gcr.io/pause:3.1 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384} ContainerdRootDir:/var/snap/microk8s/common/var/lib/containerd ContainerdEndpoint:/var/snap/microk8s/common/run/containerd.sock RootDir:/var/snap/microk8s/common/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri}"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007463392+02:00" level=info msg="Connect containerd service"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007588718+02:00" level=info msg="Get image filesystem path "/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs""
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.007938620+02:00" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.008033469+02:00" level=info msg="Start subscribing containerd event"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.008073539+02:00" level=info msg="Start recovering state"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.008125886+02:00" level=info msg=serving... address="127.0.0.1:1338"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.008283570+02:00" level=info msg=serving... address="/var/snap/microk8s/common/run/containerd.sock"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.008307407+02:00" level=info msg="containerd successfully booted in 0.006902s"
Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.018788431+02:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name "organization-7d749879c7-q9kgq_default_acc9a3e8-8aff-11e9-861d-9829a631c5d3_8": name "organization-7d749879c7-q9kgq_default_acc9a3e8-8aff-11e9-861d-9829a631c5d3_8" is reserved for "af18314a8fd1ca2470dfe2a59df6ac0cbf05a2fb620530353223434b78b5c7d7""
Jun 14 21:02:30 nico-notebook-acer systemd[1]: snap.microk8s.daemon-containerd.service: Main process exited, code=exited, status=1/FAILURE
Jun 14 21:02:30 nico-notebook-acer systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Jun 14 21:02:30 nico-notebook-acer systemd[1]: snap.microk8s.daemon-containerd.service: Service RestartSec=100ms expired, scheduling restart.
Jun 14 21:02:30 nico-notebook-acer systemd[1]: snap.microk8s.daemon-containerd.service: Scheduled restart job, restart counter is at 5.
Jun 14 21:02:30 nico-notebook-acer systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Jun 14 21:02:30 nico-notebook-acer systemd[1]: snap.microk8s.daemon-containerd.service: Start request repeated too quickly.
Jun 14 21:02:30 nico-notebook-acer systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Jun 14 21:02:30 nico-notebook-acer systemd[1]: Failed to start Service for snap application microk8s.daemon-containerd.

When checking the logs for snap.microk8s.daemon-kubelet it tells me:

Jun 14 21:02:29 nico-notebook-acer systemd[1]: Started Service for snap application microk8s.daemon-kubelet.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --pod-cidr has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --non-masquerade-cidr has been deprecated, will be removed in a future version
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --eviction-hard has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.492914    4623 server.go:417] Version: v1.14.2
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.493260    4623 plugins.go:103] No cloud provider specified.
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.525599    4623 server.go:625] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.525937    4623 container_manager_linux.go:261] container manager verified user specified cgroup-root exists: []
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.525951    4623 container_manager_linux.go:266] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/snap/microk8s/common/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:1Gi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:1Gi Percentage:0} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.526008    4623 container_manager_linux.go:286] Creating device plugin manager: true
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.526031    4623 state_mem.go:36] [cpumanager] initializing new in-memory state store
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.526121    4623 state_mem.go:84] [cpumanager] updated default cpuset: ""
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.526132    4623 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.526239    4623 kubelet.go:304] Watching apiserver
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: W0614 21:02:29.527986    4623 util_unix.go:77] Using "/var/snap/microk8s/common/run/containerd.sock" as endpoint is deprecated, please consider using full url format "unix:///var/snap/microk8s/common/run/containerd.sock".
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528055    4623 remote_runtime.go:62] parsed scheme: ""
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528068    4623 remote_runtime.go:62] scheme "" not registered, fallback to default scheme
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: W0614 21:02:29.528088    4623 util_unix.go:77] Using "/var/snap/microk8s/common/run/containerd.sock" as endpoint is deprecated, please consider using full url format "unix:///var/snap/microk8s/common/run/containerd.sock".
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528107    4623 remote_image.go:50] parsed scheme: ""
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528117    4623 remote_image.go:50] scheme "" not registered, fallback to default scheme
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528206    4623 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{/var/snap/microk8s/common/run/containerd.sock 0  <nil>}]
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528219    4623 clientconn.go:796] ClientConn switching balancer to "pick_first"
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528284    4623 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{/var/snap/microk8s/common/run/containerd.sock 0  <nil>}]
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528311    4623 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0002ccc90, CONNECTING
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528320    4623 clientconn.go:796] ClientConn switching balancer to "pick_first"
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528368    4623 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0003f98d0, CONNECTING
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: W0614 21:02:29.528427    4623 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {/var/snap/microk8s/common/run/containerd.sock 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/snap/microk8s/common/run/containerd.sock: connect: connection refused". Reconnecting...
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528472    4623 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0002ccc90, TRANSIENT_FAILURE
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: E0614 21:02:29.528495    4623 remote_runtime.go:85] Version from runtime service failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/snap/microk8s/common/run/containerd.sock: connect: connection refused"
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: W0614 21:02:29.528520    4623 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {/var/snap/microk8s/common/run/containerd.sock 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/snap/microk8s/common/run/containerd.sock: connect: connection refused". Reconnecting...
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: E0614 21:02:29.528552    4623 kuberuntime_manager.go:196] Get runtime version failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/snap/microk8s/common/run/containerd.sock: connect: connection refused"
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: F0614 21:02:29.528575    4623 server.go:265] failed to run Kubelet: failed to create kubelet: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/snap/microk8s/common/run/containerd.sock: connect: connection refused"
Jun 14 21:02:29 nico-notebook-acer microk8s.daemon-kubelet[4623]: I0614 21:02:29.528579    4623 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0003f98d0, TRANSIENT_FAILURE
Jun 14 21:02:29 nico-notebook-acer systemd[1]: snap.microk8s.daemon-kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
Jun 14 21:02:29 nico-notebook-acer systemd[1]: snap.microk8s.daemon-kubelet.service: Failed with result 'exit-code'.
Jun 14 21:02:29 nico-notebook-acer systemd[1]: snap.microk8s.daemon-kubelet.service: Service RestartSec=100ms expired, scheduling restart.
Jun 14 21:02:29 nico-notebook-acer systemd[1]: snap.microk8s.daemon-kubelet.service: Scheduled restart job, restart counter is at 5.
Jun 14 21:02:29 nico-notebook-acer systemd[1]: Stopped Service for snap application microk8s.daemon-kubelet.
Jun 14 21:02:29 nico-notebook-acer systemd[1]: snap.microk8s.daemon-kubelet.service: Start request repeated too quickly.
Jun 14 21:02:29 nico-notebook-acer systemd[1]: snap.microk8s.daemon-kubelet.service: Failed with result 'exit-code'.
Jun 14 21:02:29 nico-notebook-acer systemd[1]: Failed to start Service for snap application microk8s.daemon-kubelet.

So to me it looks like kubelet just expects containerd to be running, and containerd is failing to start because of

Jun 14 21:02:30 nico-notebook-acer microk8s.daemon-containerd[4832]: time="2019-06-14T21:02:30.018788431+02:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name "organization-7d749879c7-q9kgq_default_acc9a3e8-8aff-11e9-861d-9829a631c5d3_8": name "organization-7d749879c7-q9kgq_default_acc9a3e8-8aff-11e9-861d-9829a631c5d3_8" is reserved for "af18314a8fd1ca2470dfe2a59df6ac0cbf05a2fb620530353223434b78b5c7d7""

?

I tried kind of everything i found online, but nothing really helped, microk8s.reset hangs, probably cause those 2 services aren't running, and reinstalling the snap package doesn't work cause it tries to do a snapshot, but never finishes that one. So I'm kinda stuck with a broken microk8s here, but microk8s is my goto kube setup for local development, so actually I'm using it on a day to day basis. Could someone please help me fix this issue?

I attached the microk8s.inspect generated tarball, as well as the last 100 lines of both the kubelet log and the containerd log.

snap.microk8s.daemon-containerd.log
snap.microk8s.daemon-kubelet.log

inspection-report-20190614_205925.tar.gz

Thanks in advance!

Nico

Most helpful comment

Hello.
Here is a script which fixed this issue without fully reinstall microk8s.
To summarize, this script purge all container in containerd. Then kubelet is able to restart containers without conflicts.

#!/bin/bash
/snap/bin/microk8s.stop
kill $(pidof containerd)
echo 'disabled_plugins = ["cri"]'>/var/snap/microk8s/current/args/containerd.toml.orig
sed -i -e "s/containerd\.toml$/containerd.toml.orig/g" /var/snap/microk8s/current/args/containerd
nohup /usr/bin/snap run microk8s.daemon-containerd &
containers=$(/snap/bin/microk8s.ctr -n k8s.io containers list)

while IFS= read -r line; do
    container=$(echo $line|awk '{print $1}')
    echo kill ${container}
    /snap/bin/microk8s.ctr -n k8s.io containers rm $container --keep-snapshot
done <<< "$containers"

sed -i -e "s/containerd\.toml\.orig$/containerd.toml/g" /var/snap/microk8s/current/args/containerd
kill $(pidof containerd)
/snap/bin/microk8s.start
/snap/bin/microk8s.stop
/snap/bin/microk8s.start
/snap/bin/microk8s.inspect

All 13 comments

Hi @niggoo

The only related issue I found was this one: https://github.com/containerd/cri/issues/1014

The suggested solution to unblock containerd is to start it with disable_plugins = [ cri ] and then use microk8s.ctr containers rm to remove the failing container under the -n k8s.io namespace.

For sure not a pleasant experience.

Hi!

Thanks for your answer!

I found the mentioned link, but wasn't able to fully follow the instructions.
On my machine there is no /etc/containerd/config.toml

But I think I found the right configuration file as /var/snap/microk8s/608/args/containerd-template.toml ?

So when I just try to put disable_plugins = [ cri ] at the top of the file, the containerd logs would just tell me
Jun 17 18:41:49 nico-notebook-acer microk8s.daemon-containerd[9935]: containerd: Near line 1 (last key parsed 'disable_plugins'): expected value but found "cri" instead

Also placing the entry into the group "plugins" yields the same error.

I also tried putting it with quotes, like disable_plugins = [ "cri" ], but now it seems to ignore the statement, and we are back to the original error message.
Maybe this is kind of dumb, but can you tell me where to put that statement in the correct form?

The file at the moment looks like this without any "disable_plugins" statement:

oom_score = 0

[grpc]
  uid = 0
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  address = ""
  uid = 0
  gid = 0

[metrics]
  address = "127.0.0.1:1338"
  grpc_histogram = false

[cgroup]
  path = ""

[plugins]
  [plugins.cgroups]
    no_prometheus = false
  [plugins.cri]
    stream_server_address = "127.0.0.1"
    stream_server_port = "0"
    enable_selinux = false
    sandbox_image = "k8s.gcr.io/pause:3.1"
    stats_collect_period = 10
    systemd_cgroup = false
    enable_tls_streaming = false
    max_container_log_line_size = 16384
    [plugins.cri.containerd]
      snapshotter = "overlayfs"
      no_pivot = false
      [plugins.cri.containerd.default_runtime]
        runtime_type = "io.containerd.runtime.v1.linux"
        runtime_engine = ""
        runtime_root = ""
      [plugins.cri.containerd.untrusted_workload_runtime]
        runtime_type = ""
        runtime_engine = ""
        runtime_root = ""
    [plugins.cri.cni]
      bin_dir = "${SNAP}/opt/cni/bin"
      conf_dir = "${SNAP_DATA}/args/cni-network"
      conf_template = ""
    [plugins.cri.registry]
      [plugins.cri.registry.mirrors]
        [plugins.cri.registry.mirrors."docker.io"]
          endpoint = ["https://registry-1.docker.io"]
        [plugins.cri.registry.mirrors."local.insecure-registry.io"]
          endpoint = ["http://localhost:32000"]
  [plugins.diff-service]
    default = ["walking"]
  [plugins.linux]
    shim = "containerd-shim"
    runtime = "${RUNTIME}"
    runtime_root = ""
    no_shim = false
    shim_debug = true
  [plugins.scheduler]
    pause_threshold = 0.02
    deletion_threshold = 0
    mutation_threshold = 100
    schedule_delay = "0s"
    startup_delay = "100ms"

/var/snap/microk8s/608/args/containerd-template.toml

Do you know if this is even the correct file? I was just guessing cause it changed the log output, so I thought it must be the right one :)

Thanks in advance for your help!

Nico

The file you mention (/var/snap/microk8s/608/args/containerd-template.toml) is the right one. You have to update this file and do a microk8s.stop, microk8s.start cycle.

I did add disable_plugins = [ "cri" ] in the deployment I have here and it went through. It made it to /var/snap/microk8s/608/args/containerd.toml and containerd says it started fine.

I wonder how I can reproduce the state your containerd is in. Any ideas?

Hi!

I played with it now like for the whole day, but just couldn't get it to work. So i just used the hammer and killed everything manually, cause I really needed to get going again.

Sadly this doesn't help fix the problem at its core, but at least it may help those having the same problem:

snap remove microk8s was never finishing probably because of too much data i guess?
So i inspected the processes starting, cause it stuck at Save data of snap "microk8s" in automatic snapshot set #7
Which was doing a gzip archive of the whole /var/snap/microk8s/common folder.
So eventually i just did sudo rm -r /var/snap/microk8s to delete all the files snap would like to backup first.
Then executed snap remove microk8s again - which then worked smoothly.
And after a clean reinstall using snap install microk8s --classic && microk8s.start everything seems to perform normal again.

So maybe, in order to workaround this problem next time - Would it be an option to make the process of backing up and gzipping the whole /var/snap/microk8s directory optional? Cause if it would be, simply reinstalling microk8s would at least help a developer to keep going instead of having to troubleshoot all the details. I understand that there is probably a good reason for this backing up, but this reason also kept me struggling for the last 4 days now.

Thanks for your support and your time anyways!

Nico

I just ran into this problem as well.

Okt 31 08:49:43 marvin microk8s.daemon-containerd[15212]: time="2019-10-31T08:49:43.094783878+01:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name "metrics-server-v0.2.1-598c8978c-dtkm7_kube-system_e001f423-4af0-41ca-91be-76d611f961af_8": name "metrics-server-v0.2.1-598c8978c-dtkm7_kube-system_e001f423-4af0-41ca-91be-76d611f961af_8" is reserved for "10a8ed8f936e22cde18bd2c68ca276e92263aeb3f2b7ac2804e68a2fc1c75a39""

@gysel did you find working solution for this problem?

No, I reinstalled MicroK8s.

PS: the config option for the workaround is disabled_plugins = ["cri"], not disable_plugins = ["cri"]

Hello.
Here is a script which fixed this issue without fully reinstall microk8s.
To summarize, this script purge all container in containerd. Then kubelet is able to restart containers without conflicts.

#!/bin/bash
/snap/bin/microk8s.stop
kill $(pidof containerd)
echo 'disabled_plugins = ["cri"]'>/var/snap/microk8s/current/args/containerd.toml.orig
sed -i -e "s/containerd\.toml$/containerd.toml.orig/g" /var/snap/microk8s/current/args/containerd
nohup /usr/bin/snap run microk8s.daemon-containerd &
containers=$(/snap/bin/microk8s.ctr -n k8s.io containers list)

while IFS= read -r line; do
    container=$(echo $line|awk '{print $1}')
    echo kill ${container}
    /snap/bin/microk8s.ctr -n k8s.io containers rm $container --keep-snapshot
done <<< "$containers"

sed -i -e "s/containerd\.toml\.orig$/containerd.toml/g" /var/snap/microk8s/current/args/containerd
kill $(pidof containerd)
/snap/bin/microk8s.start
/snap/bin/microk8s.stop
/snap/bin/microk8s.start
/snap/bin/microk8s.inspect

Thanks @chmit.
I had to modify slightly to work for me on Ubuntu 19.10 with microk8s v1.18.2.
Modifications:

#!/bin/bash
/snap/bin/microk8s.stop
kill -15 $(pidof containerd)
echo 'disabled_plugins = ["cri"]'>/var/snap/microk8s/current/args/containerd.toml.orig
sed -i -e "s/containerd\.toml$/containerd.toml.orig/g" /var/snap/microk8s/current/args/containerd
nohup /usr/bin/snap run microk8s.daemon-containerd &
containers=$(/snap/bin/microk8s.ctr containers list)

while IFS= read -r line; do
    container=$(echo $line|awk '{print $1}')
    echo kill ${container}
    /snap/bin/microk8s.ctr containers rm $container --keep-snapshot
done <<< "$containers"

sed -i -e "s/containerd\.toml\.orig$/containerd.toml/g" /var/snap/microk8s/current/args/containerd
kill $(pidof containerd)
/snap/bin/microk8s.start
/snap/bin/microk8s.stop
/snap/bin/microk8s.start
/snap/bin/microk8s.inspect

I had the same problem. Unfortunately, when starting containerd without CRI plugin, I was unable to see any running container. My solution was:

microk8s.stop
mv /var/snap/microk8s/common/var/lib/containerd /var/snap/microk8s/common/var/lib/_containerd
microk8s.start

Then, apiserver took care on scheduling all the containers again.

I also had this issue. My kubelet was NotReady and the microk8s.inspect reported that the containerd service had some issue. This was following a hard reset. The logs I got where these:

Oct 26 12:54:37 inspiron microk8s.daemon-containerd[690154]: time="2020-10-26T12:54:37.335766588+02:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name "vault-dev-configurer-677746d9fd-pnm7r_default_650e3e90-01f0-40cb-8931-fa3f56792a13_177": name "vault-dev-configurer-677746d9fd-pnm7r_default_650e3e90-01f0-40cb-8931-fa3f56792a13_177" is reserved for "75d23dd1a5449553b27ebeea54895ba72cbb9cb00fa9438def2ede85040de6a5""

I managed to fix it without reinstalling all of microk8s. The steps where these:

  1. Add disabled_plugins = ["cri"] at the top of /var/snap/microk8s/current/args/containerd-template.toml
  2. systemctl stop snap.microk8s.daemon-containerd.service
  3. microk8s.ctr -n=k8s.io containers rm 75d23dd1a5449553b27ebeea54895ba72cbb9cb00fa9438def2ede85040de6a5
  4. systemctl start snap.microk8s.daemon-containerd.service
  5. Remove disabled_plugin from /var/snap/microk8s/current/args/containerd-template.toml
  6. systemctl restart snap.microk8s.daemon-containerd.service

Everything seems stable now and running fine

Was this page helpful?
0 / 5 - 0 ratings