Kubeadm: default the "cgroupDriver" setting of the kubelet to "systemd"

Created on 13 Jan 2021  Â·  31Comments  Â·  Source: kubernetes/kubeadm

the kubelet from the official k8s package is run using systemd (as the serv manager) and kubeadm depends on this assumption.
the drivers between the kubelet and the runtime should match and if the kubelet is run using systemd the driver should be "systemd":
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

for Docker there is auto-detection of cgroup drivers, which we may move to be on the side of dockershim:
https://github.com/kubernetes/kubernetes/pull/97764#discussion_r556595741
for runtimes other than Docker, currently kubeadm does not auto-detect or set the value.

this ticket is about the proposal to default KubeletConfiguration that kubeadm generates to the "systemd" driver unless the user is explicit about it - e.g.:

if kubeletConfig.cgroupDriver = "" {
  kubeletConfig.cgroupDriver = "systemd"
}

the container runtime docs already instruct users to use the systemd driver:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

but the above will be a breaking change to users that are not configuring their CR to systemd and not matching that for the kubelet.

opening this ticket to gather feedback from users and higher level tools.

chat in the #kind channel from today:
https://kubernetes.slack.com/archives/CEKK1KTN2/p1610555169036600


1.21

1.22

  • [ ] move the defaulting to cmd/kubeadm/app/componentconfigs/kubelet.go#Default(). This will make it apply to all commands:
    TODO
kindesign kindocumentation kinfeature lifecyclactive prioritimportant-soon

Most helpful comment

@neolit123 @afbjorklund I can give the CRI-O docs an update this cycle and mention the driver. :+1:

All 31 comments

@BenTheElder (kind) @afbjorklund (minikube) @randomvariable @fabriziopandini (cluster api)

@floryut (kubespray)

kubespray force the configuration I think, for docker it's systemd, and for containerd it's still cgroupfs but should change soon

ping @saschagrunert for crio feedback

i've noticed we are missing the cgroup_manager: systemd in our cri-o config / install instructions BTW:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

this is something i can update this cycle (but would prefer if someone more familiar with cri-o does this).

the container runtime docs already instruct users to use the systemd driver:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

the docs say that you should use systemd as the driver, if your system is using systemd...

When systemd is chosen as the init system for a Linux distribution, [...]

the emphasis is on that you shouldn't have different drivers, not that you _must_ use systemd ?

When there are two cgroup managers on a system, you end up with two views of those resources


minikube does some detection of what the host is using, and sets cgroupDriver to match

https://github.com/kubernetes/minikube/issues/4172~~ (PR https://github.com/kubernetes/minikube/pull/6287)

however, for the supported OS distribution (VM) the default _has_ been changed to systemd

https://github.com/kubernetes/minikube/issues/4770~~ (PR https://github.com/kubernetes/minikube/pull/6651)

but the above will be a breaking change to users that are not configuring their CR to systemd and not matching that for the kubelet.

I think the vocal minority that does not use systemd could get upset about the runtime default changing...

But as far as I know, Kubernetes has recommended changing the Docker default driver for a _long_ time ?

@neolit123

i've noticed we are missing the cgroup_manager: systemd in our cri-o config

the cri-o default changed in version 1.18, so most users would have systemd (by default)

https://github.com/cri-o/cri-o/commit/9ec532c7f47911475004671d9fdabc2cf4351d86

https://github.com/cri-o/cri-o/issues/3719

but it would be good to have it documented, since cgroupfs was the default for cri-o 1.17

the docs say that you should use systemd as the driver, if your system is using systemd...

that is true, of course. but the official kubelet package uses systemd to manage the kubelet.

thanks for the details on minikube

I think the vocal minority that does not use systemd could get upset about the runtime default changing...
But as far as I know, Kubernetes has recommended changing the Docker default driver for a long time ?

indeed, we have explained to users that they should use the "systemd" driver for a number of releases (outside of the support skew).
if someone is using the official kubeadm / kubelet packages it makes sense for the driver to be set to "systemd" on both CR and kubelet sides.

@neolit123 @afbjorklund I can give the CRI-O docs an update this cycle and mention the driver. :+1:

thanks @saschagrunert

and for containerd it's still cgroupfs but should change soon

so Is there any docs for containerd change default cgroup driver to systemd? @champtar

There is no doc at the moment, but we plan to default containerd (as the CR) in the next kubespray release, and while we do that we plan to move the cgroup driver to systemd.

i think there must be a common method to check cri cgroup driver, so during kubeadm init , if kubeletconfig. groupDirver is blank, we check the cri cgroup driver by the commom method. this checking is default by using docker now ? but missing for container/cri-o?

i think there must be a common method to check cri cgroup driver, so during kubeadm init , if kubeletconfig. groupDirver is blank, we check the cri cgroup driver by the commom method. this checking is default by using docker now ? but missing for container/cri-o?

It was discussed in #844, when doing the current docker info -f "{{.CgroupDriver}}" hack

we check the cri cgroup driver by the common method

this is currently not possible. some time ago, someone invested a lot of time investigating how to place cgroup driver detection inside the kublet for all container runtimes but it never happened.

@neolit123 I have some concern about changing the default mostly for upgrade workflow:

In the case of kubeadm upgrades (in-place upgrades):
I should double check if changing the default for cgroup impacts upgrades on an existing node; assuming the worst case (it impacts), this could lead to problems as documented here.

In the case of Cluster API (immutable upgrades), this requires documentation/coordination with the users, unless this happens by default in CAPBK; also AFAIK currently it is not possible to change the KubeletConfiguration (see https://github.com/kubernetes-sigs/cluster-api/issues/1584), so are we still relying on ExtraArgs, which is not ideal.

We had some "interesting" issues in minikube, that turned out to be Docker configuration (or handling):

$ sudo systemctl stop docker.socket
$ docker info --format {{.CgroupDriver}}
panic: reflect: indirection through nil pointer to embedded struct [recovered]
    panic: reflect: indirection through nil pointer to embedded struct

The moral of the story is to _always_ check the docker daemon status, before trying to look at server details...

@fabriziopandini

In the case of kubeadm upgrades (in-place upgrades):

this will not be a problem for the kubeadm mutable upgrades as kubeadm no longer upgrades the KubeletConfiguration (part of Rosti's work). so the shared KubeletConfiguration for all nodes will remain on "cgroupfs" if the user did not set it.
for new clusters however (via init), kubeadm will default it to "systemd" which will then require all nodes to have the CR set to "systemd" too.

In the case of Cluster API (immutable upgrades),

if both image builder set the CR to "systemd" and kubeadm sets KubeletConfiguration to "systemd" (by default) and assuming that image-builder produces images strictly targeting a k8s version this will not be a problem. the --cgroup-driver flag doesn't have to be set via image builder or CAPBK doesn't have to pass this via KubeletConfiguration.

(see kubernetes-sigs/cluster-api#1584)

with all kubelet flags being removed this will hit the users.
so ideally CAPBK should be ahead of time and support KubeletConfiguration soon.

@afbjorklund

panic: reflect: indirection through nil pointer to embedded struct [recovered]

this seems like something that can be avoided even if the server is not running.

this seems like something that can be avoided even if the server is not running.

Indeed, but haven't opened a bug with docker (or moby?) yet. Still happens, though.

I don't think that kubeadm should be modifying the configuration for the container runtime. It looks like crictl from cri-tools already supports exposing this information and is already being used by kubeadm, it seems like it would be a good choice to use for this purpose as well.

output from running crictl info within a kind cluster:

{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": true,
        "reason": "",
        "message": ""
      }
    ]
  },
  "cniconfig": {
    "PluginDirs": [
      "/opt/cni/bin"
    ],
    "PluginConfDir": "/etc/cni/net.d",
    "PluginMaxConfNum": 1,
    "Prefix": "eth",
    "Networks": [
      {
        "Config": {
          "Name": "cni-loopback",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "type": "loopback",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"type\":\"loopback\"}"
            }
          ],
          "Source": "{\n\"cniVersion\": \"0.3.1\",\n\"name\": \"cni-loopback\",\n\"plugins\": [{\n  \"type\": \"loopback\"\n}]\n}"
        },
        "IFName": "lo"
      },
      {
        "Config": {
          "Name": "multus-cni-network",
          "CNIVersion": "0.3.1",
          "Plugins": [
            {
              "Network": {
                "cniVersion": "0.3.1",
                "name": "multus-cni-network",
                "type": "multus",
                "ipam": {},
                "dns": {}
              },
              "Source": "{\"cniVersion\":\"0.3.1\",\"delegates\":[{\"cniVersion\":\"0.3.1\",\"name\":\"kindnet\",\"plugins\":[{\"ipMasq\":false,\"ipam\":{\"dataDir\":\"/run/cni-ipam-state\",\"ranges\":[[{\"subnet\":\"10.244.0.0/24\"}]],\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"type\":\"host-local\"},\"mtu\":1500,\"type\":\"ptp\"},{\"capabilities\":{\"portMappings\":true},\"type\":\"portmap\"}]}],\"kubeconfig\":\"/etc/cni/net.d/multus.d/multus.kubeconfig\",\"name\":\"multus-cni-network\",\"type\":\"multus\"}"
            }
          ],
          "Source": "{\"cniVersion\":\"0.3.1\",\"name\":\"multus-cni-network\",\"plugins\":[{\"cniVersion\":\"0.3.1\",\"delegates\":[{\"cniVersion\":\"0.3.1\",\"name\":\"kindnet\",\"plugins\":[{\"ipMasq\":false,\"ipam\":{\"dataDir\":\"/run/cni-ipam-state\",\"ranges\":[[{\"subnet\":\"10.244.0.0/24\"}]],\"routes\":[{\"dst\":\"0.0.0.0/0\"}],\"type\":\"host-local\"},\"mtu\":1500,\"type\":\"ptp\"},{\"capabilities\":{\"portMappings\":true},\"type\":\"portmap\"}]}],\"kubeconfig\":\"/etc/cni/net.d/multus.d/multus.kubeconfig\",\"name\":\"multus-cni-network\",\"type\":\"multus\"}]}"
        },
        "IFName": "eth0"
      }
    ]
  },
  "config": {
    "containerd": {
      "snapshotter": "overlayfs",
      "defaultRuntimeName": "runc",
      "defaultRuntime": {
        "runtimeType": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "baseRuntimeSpec": ""
      },
      "untrustedWorkloadRuntime": {
        "runtimeType": "",
        "runtimeEngine": "",
        "PodAnnotations": null,
        "ContainerAnnotations": null,
        "runtimeRoot": "",
        "options": null,
        "privileged_without_host_devices": false,
        "baseRuntimeSpec": ""
      },
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "ContainerAnnotations": null,
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": ""
        },
        "test-handler": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimeEngine": "",
          "PodAnnotations": null,
          "ContainerAnnotations": null,
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": ""
        }
      },
      "noPivot": false,
      "disableSnapshotAnnotations": false,
      "discardUnpackedLayers": false
    },
    "cni": {
      "binDir": "/opt/cni/bin",
      "confDir": "/etc/cni/net.d",
      "maxConfNum": 1,
      "confTemplate": ""
    },
    "registry": {
      "mirrors": {
        "docker.io": {
          "endpoint": [
            "https://registry-1.docker.io"
          ]
        }
      },
      "configs": null,
      "auths": null,
      "headers": null
    },
    "imageDecryption": {
      "keyModel": ""
    },
    "disableTCPService": true,
    "streamServerAddress": "127.0.0.1",
    "streamServerPort": "0",
    "streamIdleTimeout": "4h0m0s",
    "enableSelinux": false,
    "selinuxCategoryRange": 1024,
    "sandboxImage": "k8s.gcr.io/pause:3.3",
    "statsCollectPeriod": 10,
    "systemdCgroup": false,
    "enableTLSStreaming": false,
    "x509KeyPairStreaming": {
      "tlsCertFile": "",
      "tlsKeyFile": ""
    },
    "maxContainerLogSize": 16384,
    "disableCgroup": false,
    "disableApparmor": false,
    "restrictOOMScoreAdj": false,
    "maxConcurrentDownloads": 3,
    "disableProcMount": false,
    "unsetSeccompProfile": "",
    "tolerateMissingHugetlbController": true,
    "disableHugetlbController": true,
    "ignoreImageDefinedVolumes": false,
    "containerdRootDir": "/var/lib/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
  },
  "golang": "go1.13.15",
  "lastCNILoadStatus": "OK"
}

@detiber

I don't think that kubeadm should be modifying the configuration for the container runtime.

do you mean, don't modify the CR configuration on disk, or the KubeletConfiguration that kubeadm generates WRT the cgroup driver too?

output from running crictl info within a kind cluster:

i was monitoring this effort and hoped it will work at some point.

for dockershim at least it returns only the list of conditions (for this we can continue using "docker info").
the crio socket gave me the same result.

IMO, ideally the cgroupDriver setting in the kubelet should not exist. once the user feeds a container runtime socket, the kubelet should probe the socket, obtain the driver and match it.

This issue was discussed during CAPI office hours on the 20th of January, and the outcomes of the discussion are captured in https://github.com/kubernetes/kubeadm/issues/2376.

TL;DR; in order to coordinate the change among all the involved parties and to provide a clean upgrade path for the users/for immutable upgrades, we should ensure that kubeadm configures kubelet for using the systemd cgroup driver as a default for cluster initialized with Kubernetes version >= v1.21

i saw no objections to the kubeadm change so i will send a WIP PR for this today to get some feedback.
code freeze for 1.21 is the 9th of March.

i've sent the PR for moving to systemd in kubeadm 1.21:
https://github.com/kubernetes/kubernetes/pull/99471

cc @xphoniex @fcolista @oz123 for feedback on Alpine / OpenRC.

  • would this change be problematic for OpenRC?
  • my assumption is that Alpine container runtimes are defaulted to the "cgroupfs" driver?
  • are there plans for cgroupv2 support under Alpine which would need the "systemd" driver IIUC?

alternatively in the pending PR change we could only apply the "systemd" driver if the systemd init system is used.

/cc

I can't speak for alpine. However, I am fine if this can be changed with a configuration (as this is the case).

after some discussion on the PR the latest proposal is the following:

  • in 1.21 "kubeadm init" will start applying the "systemd" driver by default unless the user is explicit in the KubeletConfiguration, but it will not do that for other commands like "kubeadm upgrade".
  • we keep this issue open and in 1.22 all kubeadm commands will default to the "systemd" driver unless the user was explicit about it.

i have plans to write a small guide on how users can migrate to the "systemd" driver and this guide will be linked from https://kubernetes.io/docs/setup/production-environment/container-runtimes/

users that don't wish to migrate can be explicit and set "cgroupfs" as their driver, which means "upgrade" will not touch their setup in this regard.

Was this page helpful?
0 / 5 - 0 ratings