Kubespray: Use systemd cgroup driver for Docker on systemd systems (not only RedHat)

Created on 28 Aug 2019  路  12Comments  路  Source: kubernetes-sigs/kubespray

What would you like to be added:

Currently only deployments on RedHat ensure that systemd cgroup driver is configured for Docker daemon, however for other systemd based systems such as Debian and Ubuntu, the default cgroupfs is still utilized and there's no straight forward way to set it.

I think the cgroup driver for Docker should be easily configurable and given the referenced attached below, it might be a good idea to use systemd (for both Docker and kubelet) when a systemd system detected.

Why is this needed:

Several discussions and even Kubernetes documentation strongly suggests that systemd based systems should avoid using a second cgroup manager (i.e. cgroupfs) and stick to systemd cgroup driver, as this could lead to various issues down the line.

References:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker

https://github.com/kubernetes/kubeadm/issues/1394#issuecomment-462878219

help wanted kinfeature

Most helpful comment

any thoughts? :smile:

All 12 comments

any thoughts? :smile:

We are seeing this issue too. We are using Ubuntu 18.0.4 image.

FAILED - RETRYING: kubeadm | Initialize first master (1 retries left).Result was: {
    "attempts": 3,
    "changed": true,
    "cmd": [
        "timeout",
        "-k",
        "300s",
        "300s",
        "/usr/local/bin/kubeadm",
        "init",
        "--config=/etc/kubernetes/kubeadm-config.yaml",
        "--ignore-preflight-errors=all",
        "--skip-phases=addon/coredns",
        "--upload-certs"
    ],
    "delta": "0:05:00.008828",
    "end": "2019-09-23 23:57:08.652483",
    "failed_when_result": true,
    "invocation": {
        "module_args": {
            "_raw_params": "timeout -k 300s 300s /usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --skip-phases=addon/coredns   --upload-certs  ",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 124,
    "retries": 4,
    "start": "2019-09-23 23:52:08.643655",
    "stderr": "\t[WARNING Port-10251]: Port 10251 is in use\n\t[WARNING Port-10252]: Port 10252 is in use\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists\n\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/\n\t[WARNING Port-10250]: Port 10250 is in use",
    "stderr_lines": [
        "\t[WARNING Port-10251]: Port 10251 is in use",
        "\t[WARNING Port-10252]: Port 10252 is in use",
        "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists",
        "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists",
        "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists",
        "\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/",
        "\t[WARNING Port-10250]: Port 10250 is in use"
    ],
    "stdout": "[init] Using Kubernetes version: v1.15.3\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"\n[certs] Using existing ca certificate authority\n[certs] Using existing apiserver certificate and key on disk\n[certs] Using existing apiserver-kubelet-client certificate and key on disk\n[certs] Using existing front-proxy-ca certificate authority\n[certs] Using existing front-proxy-client certificate and key on disk\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation\n[certs] External etcd mode: Skipping etcd/peer certificate authority generation\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation\n[certs] External etcd mode: Skipping etcd/server certificate authority generation\n[certs] Using the existing \"sa\" key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s\n[kubelet-check] Initial timeout of 40s passed.",
    "stdout_lines": [
        "[init] Using Kubernetes version: v1.15.3",
        "[preflight] Running pre-flight checks",
        "[preflight] Pulling images required for setting up a Kubernetes cluster",
        "[preflight] This might take a minute or two, depending on the speed of your internet connection",
        "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'",
        "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"",
        "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"",
        "[kubelet-start] Activating the kubelet service",
        "[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"",
        "[certs] Using existing ca certificate authority",
        "[certs] Using existing apiserver certificate and key on disk",
        "[certs] Using existing apiserver-kubelet-client certificate and key on disk",
        "[certs] Using existing front-proxy-ca certificate authority",
        "[certs] Using existing front-proxy-client certificate and key on disk",
        "[certs] External etcd mode: Skipping etcd/ca certificate authority generation",
        "[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation",
        "[certs] External etcd mode: Skipping etcd/peer certificate authority generation",
        "[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation",
        "[certs] External etcd mode: Skipping etcd/server certificate authority generation",
        "[certs] Using the existing \"sa\" key",
        "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"",
        "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"",
        "[control-plane] Creating static Pod manifest for \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"",
        "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"",
        "[control-plane] Creating static Pod manifest for \"kube-scheduler\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"",
        "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s",
        "[kubelet-check] Initial timeout of 40s passed."
    ]
}

Actually, I don't see anything that enforces systemd cgroup driver on RedHat distros. Docker RPM is provided from docker.com and defaults to cgroup driver still.
Kubespray supports officially only Linux distros that are using systemd as init system so it would make sense to change it to systemd by default with an option to override it.
I'll work on a PR to address that.

Sounds good, I'm adding "help wanted" label on this.

Any changes there?

What would be the process for switching from cgroupfs to systemd?

  1. Drain node
  2. Remove node from cluster
  3. Update docker options with --exec-opt native.cgroupdriver=systemd
  4. Set kubelet_cgroup_driver to systemd
  5. Rejoin cluster

Is it possible to switch without having the node leave the cluster?

@servo1x an excellent question, we've been wondering about this ourselves... it's not clear why the official documentation suggests a node needs to be re-joined, since a drain and reboot of the node should clear out any residual cgroups.

A small word of caution though, before switching to systemd ensure your deployment has at least systemd v242, due to https://github.com/lnykryn/systemd-rhel/issues/266 issue in dbus, after some time docker/runC become unresponsive with thousands of socket connections to docker hanging since dbus kind of hangs, which breaks kubelet's PLEG and the whole Pod lifecycle management stop working (Pods fail to Terminate, new ones won't get created, etc...)

Restarting DBus solves the "issue", but also upgrading to systemd >v242 seems to do the trick.

Just FYI :smile:

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

A small word of caution though, before switching to systemd ensure your deployment has at least systemd v242, due to lnykryn/systemd-rhel#266 issue in dbus, after some time docker/runC become unresponsive with thousands of socket connections to docker hanging since dbus kind of hangs, which breaks kubelet's PLEG and the whole Pod lifecycle management stop working (Pods fail to Terminate, new ones won't get created, etc...)

Restarting DBus solves the "issue", but also upgrading to systemd >v242 seems to do the trick.

Just FYI 馃槃

Thank you for this useful information. It's actually even harder to track than just checking systemd version to be >242, for example, systemd on RHEL backported the fix into their 219 branch, so it might be there in RHEL platforms although their systemd version is "old".

For the upgrade path, I don't think a reboot of the node is enough, I think you need to stop and delete all running containers, restart docker and then let kubernetes reschedule them.

@EppO @servo1x @dannyk81
Is anyone working on a PR for this issue?

I applied similar changes to a customised Kubespray deployment for a recent client and would be happy to raise a PR to get the changes added to the next release as I feel it's good practice to follow the official Kubernetes release docs.

Was this page helpful?
0 / 5 - 0 ratings