Kubeadm: Improve the UX on packages

Created on 3 Oct 2018  路  18Comments  路  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

Versions

All known versions of kubernetes

Installing kubelet through packages is a bad experience. We say explicitly that it will be failing after it is installed.

After a conversation with @craigtracey, I propose that we change the packaging around kubelet and kubeadm. Installing the kubelet should result in a functioning standalone kubelet. Installing the kubeadm package should then set the kubelet up for installing kubernetes.

I think kubelet should provide a 00-kubelet.service file that defines the standalone kubelet and then kubeadm should define a 10-kubelet.service file that points to the pki and all that which will then break the kubelet.

kinbug kindesign kinrefactor prioritawaiting-more-evidence prioritimportant-longterm

All 18 comments

I don't disagree with anything stated, I think it's just a matter of priority and getting a canonical release process + tests in place to verify behavior.

/cc @dims

This behavior is currently breaking a number of downstream config management tools. I haven't determined exactly how yet, but will update this issue when I have more details.

cc @randomvariable

This is listed as a feature request, but I think this should be considered a bug. The problem here is that the 10-kubeadm.conf drop-in unit file (that comes from installing the kubeadm package), makes the assumption that the kubelet will eventually use /var/lib/kubelet/config.yaml config file at some point in the future. If the file does not exist (it doesn't until a kubeadm run), the kubelet will refuse to start.

Marking this as a defect would make it eligible for backporting a fix. This exists in 1.11 forward.

looks like the line that introduces the problem is here:
https://github.com/kubernetes/kubernetes/blob/master/build/debs/postinst#L13

The way things work currently we arguably do not want postinst to restart kubelet. The reload is fine tho.
If postinst restarts kubelet before the config is layed down it leads to a bad ux as described by @craigtracey

/assign @timothysc @chuckha

@chuckha - priority invert. This is p0.

The issue here centers around the drop-in unit files that kubeadm introduces upon install. The kubeadm package will add the 10-kubeadm.conf drop-in which includes references to files that do not exist on disk:

Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"

Specifically, these are config.yaml and bootstrap-kubelet.conf.

The installation of kubeadm not only writes these files, but then restarts the kubelet process. With these files missing, the kubelet will immediately enter the down/inactive state, as it has no "ignore missing files" logic. This is problematic for a few different reasons:

  1. Any static manifests that had been managed by a standalone kubelet would be destroyed simply by installing the kubeadm package.
  2. This prevents a seamless upgrade of any cluster from 1.10 to something newer, as these these two files were only introduced in 1.11 and forward. The act of apt install kubeadm/yum install kubeadm will kill all static manifest pods, and in the upgrade case, these will be for the active control plane itself.
  3. One of the preflight checks for kubeadm checks the status of the kubelet process, so this would need to be executed with some form of --ignore-preflight-errors when running kubeadm.

I believe that a much cleaner approach here is to move all of the Environment settings into an _optional_ systemd Environment file:

EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env

This would allow the kubelet to continue running despite the presence of this file being on disk or not. It also isolates all changes for the kubelet's operational state to a single place, and vastly simplifies the unit files that are currently being utilized.

@craigtracey thanks for digging in. We are aware of the upgrade issue and have it documented. We also documented we're working on a fix, but I'm not sure that was true. I think you propose a good solution but I'll need to dig in further.

Earlier, you mentioned the kubeadm packages are breaking several downstream tools. Could you elaborate on that a bit? I'm curious about exactly which tools and in which scenarios you are seeing breaking behavior.

Both kubespray in kubeadm mode and wardroom will fail on this bug. I do not have current reproductions, but this is easily obtained by running either of these tools and indicating a version >= 1.11.

@craigtracey I'm unable to reproduce this failure. I used a fork of ansible (https://github.com/kubernetes-incubator/kubespray/pull/3486) to fix a recent issue that cropped up, but after that 1.12.1 came up no problems.

Do you have any steps to reproduce the cluster create issue?

@rosskukulinski - were you able to repro?

@rosskukulinski @craigtracey Any updates on this? We are looking to finalize the last few 1.13 tickets as we are now in slush and will either punt this to 1.14 if it ends up being a feature fix or we will keep it open for 1.13 if it's a reproducible bug.

Time has expired freeze is tomorrow, if we have a repro we can try to backport where necessary. Punting to 1.14.

Going to close this for now, but please reopen with detailed repo instructions once folks can verify.

tldr: Would it make sense for kubeadm to create the 10-kubeadm.conf file at kubeadm init/join time, instead of shipping the drop-in file in the kubeadm package?


It is somewhat unclear to me what was impossible to reproduce. With that said, I think the original issue that was brought up still stands:

Installing kubelet through packages is a bad experience. We say explicitly that it will be failing after it is installed.

I think the issue actually lies with the kubeadm package, and not necessarily with the kubelet package. If one installs the kubelet package, it is possible to get it up and running after tweaking some config (e.g. cgroup driver and setting the pod manifest path). Once that config is set, the kubelet will happily start and manage static pods.

Once one installs kubeadm, however, the kubelet enters a crash loop and will no longer start static pods that exist in the static pod manifest path. Ideally, installing a package should not break a running process (kubelet in this case).

Proposal
Would it make sense for kubeadm to create (or rename, or move into place, etc) the 10-kubeadm.conf systemd drop-in file at kubeadm init/join time, instead of shipping the drop-in as part of the kubeadm package? I believe the result would be that the installation of the kubeadm package would not break a functioning kubelet.

Our specific issue

In our case, we have run into an issue while trying to automate the bootstrapping of a cluster with kubeadm.

The use case
We want to create an etcd cluster using kubeadm on a set of nodes, and then bootstrap a kubernetes control plane on the same set of nodes. You could say we are trying to create a stacked master using kubeadm's phases to build the etcd cluster.

The reason for this is that we want our tool (heptiolabs/wardroom) to support both external etcd and stacked master scenarios (ideally through a single code path).

The problem
When we install kubelet and kubeadm, the kubelets enter a crashloop and thus will not start the etcd static pods we produce with the kubeadm phases. We can move forward and attempt to initialize the primary control plane node, which would correct the broken kubelet and start the etcd static pod on the first control plane node. However, this fails because the other etcd members are not up and running.

On the other hand, we can get the etcd cluster up if we create a systemd drop-in that has higher precedence than the kubeadm drop-in (as documented here). The problem with this approach is that kubeadm will not be able to configure the kubelet at kubeadm init time.

cc @craigtracey @rosskukulinski

xref: https://github.com/heptiolabs/wardroom/pull/182

Was this page helpful?
0 / 5 - 0 ratings