FEATURE REQUEST
If this is a FEATURE REQUEST, please:
The kubeadm alpha certs check-expiration command fails when running a k8s cluster with an external etcd cluster since some of the expected certs don't live on the control plane node. It would be helpful to have one of the two scenarios happen:
1) Add a flag to allow for skipping over files that aren't found instead of failing
2) Have kubeadm check the cluster to autodiscover if an external etcd cluster is being used and automatically skip files that aren't expected to live on a control plane node in that setup
Both might actually be helpful, though. We regularly see external etcd clusters that have been setup using kubeadm for cert/manifest generation, using the kubelet in standalone mode to run etcd in containers. The ability to run the check-expiration command with a "skip-not-found" flag would be really helpful.
/assign @fabriziopandini
cc @dlipovetsky
EDIT: actually, i think i misread this. sorry,
The kubeadm alpha certs check-expiration command fails when running a k8s cluster with an external etcd cluster since some of the expected certs don't live on the control plane node
it seems absolutely reasonable to not fail and this is a bug.
probably this means that we either need to check the ClusterConfiguration from CM or --config.
in terms of extending the command to check expiration of external etcd certs:
i think i'm not in favor of adding this, because the external etcd is/can be manged completely outside of the knowledge of kubeadm.
because the external etcd is/can be manged completely outside of the knowledge of kubeadm.
As a customer that experienced this issue I can tell you that we are using kubeadm to setup/manage external etcd and we find it extremely disappointing that kubeadm alpha certs check-expiration fails miserably on the etcd nodes and on the master nodes.
we seem to have a couple items here:
/kind bug
/kind feature
I'm +1 to get the bug fixed asap (check config map, skip in case of external etcd)
WRT to supporting external etcd my proposal is to try to get an agreement on go/no go tomorrow during office hours
After a second thought, reading from a config map can lead to a corner case where you need to the cluster for renewing certificates but your certificates are actually expired
So, probably it is necessary some more design here
In the office our we decided to not the support certificate checks on machines which are not kubernetes node because there is no convention we can leverage to for discovering certificates locations
@fabriziopandini I do not understand why external etcd nodes, created by kubeadm shouldn't get support for the command. If external etcd nodes (not kubernetes nodes) are deployed via kubeadm, why wouldn't kubeadm be able to discover certificates location? As a customer, I should be able to run the kubeadm alpha certs check-expiration and if the certificates are named correctly and are in the standard place where kubeadm creates them, then I should get a complete certificates report. If the certificates are not named correctly or do not exist in the standard location, then command can error out.
@fabriziopandini
So, probably it is necessary some more design here
@alex-vmw raises an interesting point. if the check-expiration command fails to find some certificates, my guess is that currently it errors out.
should it instead show the full table, but have MISSING next to some certificates and only show expiration for those that are present on the machine?
@neolit123
On an external etcd node (deployed via kubeadm) we get this error:
Error checking external CA condition for ca certificate authority: failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
On a master node (deployed via kubeadm) we get this error:
failed to load existing certificate etcd/healthcheck-client: open /etc/kubernetes/pki/etcd/healthcheck-client.crt: no such file or directory
Except for the Error(s) above, absolutely no useful certificate expiration information is currently displayed.
On an external etcd node (deployed via kubeadm) we get this error:
Error checking external CA condition for ca certificate authority: failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
this is problematic. currently the command in question needs to know if a certificate is created using an external CA. the same logic is used for certificate renewal.
if the /etc/kubernetes/pki/ca.crt file is absent there are a couple of options:
this special casing is not ideal.
On a master node (deployed via kubeadm) we get this error:
failed to load existing certificate etcd/healthcheck-client: open /etc/kubernetes/pki/etcd/healthcheck-client.crt: no such file or directory
this is something that we can fix, with the MISSING proposal:
https://github.com/kubernetes/kubeadm/issues/1850#issuecomment-545652497
Except for the Error(s) above, absolutely no useful certificate expiration information is currently displayed.
while there is potential to fix some of the mentioned issues, i think the workaround needs an honorable mention:
openssl x509 -noout -dates -in <filename>
if the /etc/kubernetes/pki/ca.crt file is absent there are a couple of options:
- copy the file to a machine that needs it. having the ca.crt on etcd machines is fine from a security perspective.
On external etcd nodes kubeadm creates the CA cert in etcd subdir (/etc/kubernetes/pki/etcd/ca.crt), so looking for /etc/kubernetes/pki/ca.crt on external etcd nodes is useless. Kubeadm just needs to realize it is running on an external etcd node and look for the file in the correct path.
Here is how how the kubeadm created pki dir structure looks like on an external etcd node:
# ls -l /etc/kubernetes/pki/
total 12
-rw-r--r-- 1 root root 1090 Oct 16 19:20 apiserver-etcd-client.crt
-rw-r--r-- 1 root root 1675 Oct 16 19:20 apiserver-etcd-client.key
drwxr-xr-x 2 root root 4096 Oct 16 19:20 etcd
# ls -l /etc/kubernetes/pki/etcd/
total 32
-rw-r--r-- 1 root root 1017 Oct 16 19:20 ca.crt
-rw-r--r-- 1 root root 1679 Oct 16 19:20 ca.key
-rw-r--r-- 1 root root 1094 Oct 16 19:20 healthcheck-client.crt
-rw-r--r-- 1 root root 1675 Oct 16 19:20 healthcheck-client.key
-rw-r--r-- 1 root root 1155 Oct 16 19:20 peer.crt
-rw------- 1 root root 1675 Oct 16 19:20 peer.key
-rw-r--r-- 1 root root 1155 Oct 16 19:20 server.crt
-rw------- 1 root root 1675 Oct 16 19:20 server.key
On external etcd nodes kubeadm creates the CA cert in etcd subdir (/etc/kubernetes/pki/etcd/ca.crt), so looking for /etc/kubernetes/pki/ca.crt on external etcd nodes is useless
kubeadm uses the root CA to determine external CA usage, not the etcd CA.
@neolit123
kubeadm uses the root CA to determine external CA usage, not the etcd CA.
kubeadm is just looking for /etc/kubernetes/pki/ca.crt file and has no idea what CA cert is in it (external or self-signed). On external etcd node created by kubeadm the /etc/kubernetes/pki/ca.crt file would/should never exist. On a master node /etc/kubernetes/pki/ca.crt would/should exist and can be an external root CA or it can just be self-signed CA created by kubeadm itself.
If kubeadm alpha certs check-expiration is not able to recognize what kind of node it is being executed on (external etcd, non-stacked master, stacked-master) it should just report expiration of the certificates that it can find and do not report on any files it believes are missing (because it doesn't know if they are really missing or correctly do not exist on that type of node).
@neolit123 Can we make progress on this ticket in the 1.17 timeframe? Thanks.
waiting on feedback from @fabriziopandini
check expiration on non-stacked master nodes.
This works already if passing the kubeadm config file e.g.
$ kubeadm init phase kubeconfig all --config /kind/kubeadm.conf
CERTIFICATE EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
admin.conf Nov 10, 2020 12:05 UTC 364d no
apiserver Nov 10, 2020 12:04 UTC 364d no
apiserver-kubelet-client Nov 10, 2020 12:04 UTC 364d no
controller-manager.conf Nov 10, 2020 12:05 UTC 364d no
front-proxy-client Nov 10, 2020 12:04 UTC 364d no
scheduler.conf Nov 10, 2020 12:05 UTC 364d no
However, certs renew does not works with external-ectd :-(
I'm sending a PR to this and make kubeadm to read the config from the cluster so it will not necessary anymore to pass the kubeconfig file (with fall back to current behavior if the cluster is not available or the read fails).
is not able to recognize what kind of node it is being executed on (external etcd, non-stacked master, stacked-master) it should just report expiration of the certificates that it can find ...
I consider etcd cert renew a command that should work at best effort (do whatever is possible), so I don鈥檛 see problems in sending such a patch.
However, according to my knowledge about ongoing discussions on etcd management at SCL level, we should make clear here (and possibly also on https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/) that the target tool for managing external etcd is etcdadm, not kubeadm.
As a consequence, any expectation on kubeadm managing external etcd nodes is not backed by any action item in the kubeadm roadmap; you can consider this patch as a contribution to ease the pain while etcdadm gets ready, but nothing more than that.
Most helpful comment
I'm +1 to get the bug fixed asap (check config map, skip in case of external etcd)
WRT to supporting external etcd my proposal is to try to get an agreement on go/no go tomorrow during office hours