Kubeadm: Selfhosting pivoting fails when using --store-certs-in-secrets

Created on 27 Nov 2018 · 18Comments · Source: kubernetes/kubeadm

kubeadm alpha selfhosting pivot (kubeadm v1.13.0-beta.2) fails when invoked with --store-certs-in-secrets with the following error message:

[pivot] pivoting cluster to self-hosted
[self-hosted] Created TLS secret "ca" from ca.crt and ca.key
[self-hosted] Created TLS secret "apiserver" from apiserver.crt and apiserver.key
[self-hosted] Created TLS secret "apiserver-kubelet-client" from apiserver-kubelet-client.crt and apiserver-kubelet-client.key
[self-hosted] Created TLS secret "sa" from sa.pub and sa.key
[self-hosted] Created TLS secret "front-proxy-ca" from front-proxy-ca.crt and front-proxy-ca.key
[self-hosted] Created TLS secret "front-proxy-client" from front-proxy-client.crt and front-proxy-client.key
[self-hosted] Created secret for kubeconfig file "scheduler.conf"
[self-hosted] Created secret for kubeconfig file "controller-manager.conf"
[apiclient] Found 1 Pods for label selector k8s-app=self-hosted-kube-apiserver
timed out waiting for the condition

areself-hosting help wanted kinbug lifecyclactive

Source

fabriziopandini

All 18 comments

I'm fine with waiting till 1.14 on this one.

timothysc on 27 Nov 2018

I thought this feature was being removed? The issue is likely due to https://github.com/kubernetes/kubernetes/issues/61322.

andrewrynhard on 27 Nov 2018

@andrewrynhard thanks for pointing this out!

I thought this feature was being removed?

self-hosting was removed from kubeadm init and kubeadm upgrade workflows (both of them in some way not working properly), but it was agreed to leave an alpha command with the pivoting logic that you can call after init; however, be aware that once the cluster is turned to self-hosting you are on your own (e.g. for solving checkpointing / cold restart).

fabriziopandini on 27 Nov 2018

👍1

It looks like api server can't start as etc-client certificates are not created/copied:

F1227 16:01:52.237352       1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry [https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true 0xc000884120 <nil> 5m0s 1m0s}), err (open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory)

What I don't understand is why those certificates are not needed when -store-certs-in-secrets is not used.

@neolit123, @fabriziopandini any ideas?

bart0sh on 27 Dec 2018

@bart0sh
That's the bug to be fixed. As far as I know when TLS was added to etcd, --store-certs-in-secrets was never updated accordingly. There was a PR for this https://github.com/kubernetes/kubernetes/pull/61323, but as you can see it never landed

without store certs in secrets, self-hosting works, because deamonset/pods uses existing certificates on disk
with store certs in secrets, self-hosting doesn't works, because deamonset/pods didn't find all the necessary certificates in secrets

fabriziopandini on 29 Dec 2018

@fabriziopandini Thank you for the explanations. I tried to add all etcd certificates, but generating secrets for them fails as their names contain '/', e.g. "etcd/ca". Changing names didn't work either as probably some other piece[s] of this puzzle require names with slashes. I'll dig deeper into this. Any ideas on how to better solve this are appreciated.

bart0sh on 29 Dec 2018

@bart0sh I see your problem.
kubeadm creates one secret for each cert, and this requires changing the name for certs under etcd/.
If I'm right, this should be done here

but this is not enough, it is necessary to the corresponding volume projection that places the cert in the expected place as well. If I'm right, this should be done here

PS. pay attention to external etcd mode vs local etcd mode

fabriziopandini on 29 Dec 2018

I did change it in both places, but this was not enough. Changes that you've proposed would trigger errors in generating secrets, as secret names should not contain slashes. Changing name in constants from etc/ca to etc-ca would make api server stuck on start.

bart0sh on 29 Dec 2018

😕1

@bart0sh if you can share the generated yaml for the kube-apiserver deamonset might be I can help...

fabriziopandini on 29 Dec 2018

/assign

bart0sh on 2 Jan 2019

/lifecycle active

yagonobre on 2 Jan 2019

@fabriziopandini I've got api server running with the above fix. Thank you for the help!

next thing: kube-controller-manager is crashlooping, but kubeadm WaitForPodsWithLabel API manages to catch short moment when it's in running state (not sure how that could be though):

[self-hosted] >>> wait for kube-controller-manager to come up, label k8s-app=self-hosted-kube-controller-manager
[apiclient] Found 0 Pods for label selector k8s-app=self-hosted-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=self-hosted-kube-controller-manager
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Running
[apiclient] The old Pod "kube-controller-manager-ed" is now removed (which is desired)
[apiclient] All control plane components are healthy after 0.000702 seconds

$ kubectl get pods -n kube-system |grep controller
self-hosted-kube-controller-manager-p9tpb   0/1     CrashLoopBackOff   6          8m11s

The reason for CrashLoop doesn't matter yet. The issue is that at some point pod status is 'Running'. Do you have any idea why and how to fix this?

bart0sh on 3 Jan 2019

/reopen

bart0sh on 3 Jan 2019

@bart0sh: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 3 Jan 2019

@neolit123 @kad @rosti do you have any idea why this could happen?

bart0sh on 3 Jan 2019

@bart0sh if I'm not wrong this is fixed now... can you confirm?

fabriziopandini on 8 Jan 2019

@fabriziopandini not yet. I'm working on it.

btw. Can you answer above question, please?

bart0sh on 8 Jan 2019

@bart0sh
I'm not sure that we should implement a logic that detects if a self-hosting pod runs _and then continue to run_... this could potentially lead to a never ending story.

Instead, I think we should investigate why kube-controller-manager is crashlooping, and make sure this condition is not generated by the self-hosting pivoting logic.

Once we are sure the pivoting logic doesn't introduce "regressions" that we can eventually discuss if/how to make the whole process more robust (e.g. by implementig preflight checks or rollback logic).

fabriziopandini on 8 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings