Kubeadm: Add a flag to allow check-expiration command in external etcd setups

Created on 22 Oct 2019 · 17Comments · Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

If this is a FEATURE REQUEST, please:

Describe in detail the feature/behavior/change you'd like to see.

The kubeadm alpha certs check-expiration command fails when running a k8s cluster with an external etcd cluster since some of the expected certs don't live on the control plane node. It would be helpful to have one of the two scenarios happen:

1) Add a flag to allow for skipping over files that aren't found instead of failing

2) Have kubeadm check the cluster to autodiscover if an external etcd cluster is being used and automatically skip files that aren't expected to live on a control plane node in that setup

Both might actually be helpful, though. We regularly see external etcd clusters that have been setup using kubeadm for cert/manifest generation, using the kubelet in standalone mode to run etcd in containers. The ability to run the check-expiration command with a "skip-not-found" flag would be really helpful.

kinbug kinfeature prioritbacklog

Source

krisdock

Most helpful comment

I'm +1 to get the bug fixed asap (check config map, skip in case of external etcd)
WRT to supporting external etcd my proposal is to try to get an agreement on go/no go tomorrow during office hours

fabriziopandini on 22 Oct 2019

👍2

All 17 comments

/assign @fabriziopandini
cc @dlipovetsky

EDIT: actually, i think i misread this. sorry,

The kubeadm alpha certs check-expiration command fails when running a k8s cluster with an external etcd cluster since some of the expected certs don't live on the control plane node

it seems absolutely reasonable to not fail and this is a bug.
probably this means that we either need to check the ClusterConfiguration from CM or --config.

in terms of extending the command to check expiration of external etcd certs:
i think i'm not in favor of adding this, because the external etcd is/can be manged completely outside of the knowledge of kubeadm.

neolit123 on 22 Oct 2019

because the external etcd is/can be manged completely outside of the knowledge of kubeadm.

As a customer that experienced this issue I can tell you that we are using kubeadm to setup/manage external etcd and we find it extremely disappointing that kubeadm alpha certs check-expiration fails miserably on the etcd nodes and on the master nodes.

alex-vmw on 22 Oct 2019

we seem to have a couple items here:

a bug: that the command fails for external etcd - it should just skip the check for etcd certs in that case.
a feature request: enable the command to check external etcd certs. this is not trivial as the external etcd certs can be a subject of third party tooling.

/kind bug
/kind feature

neolit123 on 22 Oct 2019

I'm +1 to get the bug fixed asap (check config map, skip in case of external etcd)
WRT to supporting external etcd my proposal is to try to get an agreement on go/no go tomorrow during office hours

fabriziopandini on 22 Oct 2019

👍2

After a second thought, reading from a config map can lead to a corner case where you need to the cluster for renewing certificates but your certificates are actually expired
So, probably it is necessary some more design here

fabriziopandini on 23 Oct 2019

In the office our we decided to not the support certificate checks on machines which are not kubernetes node because there is no convention we can leverage to for discovering certificates locations

fabriziopandini on 23 Oct 2019

@fabriziopandini I do not understand why external etcd nodes, created by kubeadm shouldn't get support for the command. If external etcd nodes (not kubernetes nodes) are deployed via kubeadm, why wouldn't kubeadm be able to discover certificates location? As a customer, I should be able to run the kubeadm alpha certs check-expiration and if the certificates are named correctly and are in the standard place where kubeadm creates them, then I should get a complete certificates report. If the certificates are not named correctly or do not exist in the standard location, then command can error out.

alex-vmw on 23 Oct 2019

@fabriziopandini

So, probably it is necessary some more design here

@alex-vmw raises an interesting point. if the check-expiration command fails to find some certificates, my guess is that currently it errors out.

should it instead show the full table, but have MISSING next to some certificates and only show expiration for those that are present on the machine?

neolit123 on 23 Oct 2019

@neolit123
On an external etcd node (deployed via kubeadm) we get this error:
Error checking external CA condition for ca certificate authority: failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

On a master node (deployed via kubeadm) we get this error:
failed to load existing certificate etcd/healthcheck-client: open /etc/kubernetes/pki/etcd/healthcheck-client.crt: no such file or directory

Except for the Error(s) above, absolutely no useful certificate expiration information is currently displayed.

alex-vmw on 24 Oct 2019

On an external etcd node (deployed via kubeadm) we get this error:
Error checking external CA condition for ca certificate authority: failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

this is problematic. currently the command in question needs to know if a certificate is created using an external CA. the same logic is used for certificate renewal.

if the /etc/kubernetes/pki/ca.crt file is absent there are a couple of options:

copy the file to a machine that needs it. having the ca.crt on etcd machines is fine from a security perspective.
kubeadm can be extended to report UNKNOWN under the "external CA" table when checking expiration only and not error out, but error out when doing renewal, because it needs to know if external CA is used.

this special casing is not ideal.

On a master node (deployed via kubeadm) we get this error:
failed to load existing certificate etcd/healthcheck-client: open /etc/kubernetes/pki/etcd/healthcheck-client.crt: no such file or directory

this is something that we can fix, with the MISSING proposal:
https://github.com/kubernetes/kubeadm/issues/1850#issuecomment-545652497

Except for the Error(s) above, absolutely no useful certificate expiration information is currently displayed.

while there is potential to fix some of the mentioned issues, i think the workaround needs an honorable mention:

openssl x509 -noout -dates -in <filename>

neolit123 on 24 Oct 2019

if the /etc/kubernetes/pki/ca.crt file is absent there are a couple of options:

copy the file to a machine that needs it. having the ca.crt on etcd machines is fine from a security perspective.

On external etcd nodes kubeadm creates the CA cert in etcd subdir (/etc/kubernetes/pki/etcd/ca.crt), so looking for /etc/kubernetes/pki/ca.crt on external etcd nodes is useless. Kubeadm just needs to realize it is running on an external etcd node and look for the file in the correct path.

Here is how how the kubeadm created pki dir structure looks like on an external etcd node:

# ls -l /etc/kubernetes/pki/
total 12
-rw-r--r-- 1 root root 1090 Oct 16 19:20 apiserver-etcd-client.crt
-rw-r--r-- 1 root root 1675 Oct 16 19:20 apiserver-etcd-client.key
drwxr-xr-x 2 root root 4096 Oct 16 19:20 etcd

# ls -l /etc/kubernetes/pki/etcd/
total 32
-rw-r--r-- 1 root root 1017 Oct 16 19:20 ca.crt
-rw-r--r-- 1 root root 1679 Oct 16 19:20 ca.key
-rw-r--r-- 1 root root 1094 Oct 16 19:20 healthcheck-client.crt
-rw-r--r-- 1 root root 1675 Oct 16 19:20 healthcheck-client.key
-rw-r--r-- 1 root root 1155 Oct 16 19:20 peer.crt
-rw------- 1 root root 1675 Oct 16 19:20 peer.key
-rw-r--r-- 1 root root 1155 Oct 16 19:20 server.crt
-rw------- 1 root root 1675 Oct 16 19:20 server.key

alex-vmw on 24 Oct 2019

On external etcd nodes kubeadm creates the CA cert in etcd subdir (/etc/kubernetes/pki/etcd/ca.crt), so looking for /etc/kubernetes/pki/ca.crt on external etcd nodes is useless

kubeadm uses the root CA to determine external CA usage, not the etcd CA.

neolit123 on 24 Oct 2019

@neolit123

kubeadm uses the root CA to determine external CA usage, not the etcd CA.

kubeadm is just looking for /etc/kubernetes/pki/ca.crt file and has no idea what CA cert is in it (external or self-signed). On external etcd node created by kubeadm the /etc/kubernetes/pki/ca.crt file would/should never exist. On a master node /etc/kubernetes/pki/ca.crt would/should exist and can be an external root CA or it can just be self-signed CA created by kubeadm itself.

If kubeadm alpha certs check-expiration is not able to recognize what kind of node it is being executed on (external etcd, non-stacked master, stacked-master) it should just report expiration of the certificates that it can find and do not report on any files it believes are missing (because it doesn't know if they are really missing or correctly do not exist on that type of node).

alex-vmw on 24 Oct 2019

@neolit123 Can we make progress on this ticket in the 1.17 timeframe? Thanks.

alex-vmw on 8 Nov 2019

waiting on feedback from @fabriziopandini

neolit123 on 8 Nov 2019

check expiration on non-stacked master nodes.

This works already if passing the kubeadm config file e.g.

$ kubeadm init phase kubeconfig all --config /kind/kubeadm.conf
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Nov 10, 2020 12:05 UTC   364d            no
apiserver                  Nov 10, 2020 12:04 UTC   364d            no
apiserver-kubelet-client   Nov 10, 2020 12:04 UTC   364d            no
controller-manager.conf    Nov 10, 2020 12:05 UTC   364d            no
front-proxy-client         Nov 10, 2020 12:04 UTC   364d            no
scheduler.conf             Nov 10, 2020 12:05 UTC   364d            no

However, certs renew does not works with external-ectd :-(

I'm sending a PR to this and make kubeadm to read the config from the cluster so it will not necessary anymore to pass the kubeconfig file (with fall back to current behavior if the cluster is not available or the read fails).

fabriziopandini on 11 Nov 2019

is not able to recognize what kind of node it is being executed on (external etcd, non-stacked master, stacked-master) it should just report expiration of the certificates that it can find ...

I consider etcd cert renew a command that should work at best effort (do whatever is possible), so I don’t see problems in sending such a patch.

However, according to my knowledge about ongoing discussions on etcd management at SCL level, we should make clear here (and possibly also on https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/) that the target tool for managing external etcd is etcdadm, not kubeadm.

As a consequence, any expectation on kubeadm managing external etcd nodes is not backed by any action item in the kubeadm roadmap; you can consider this patch as a contribution to ease the pain while etcdadm gets ready, but nothing more than that.

fabriziopandini on 11 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings