Kubeadm: Selfhosting pivoting fails when using --store-certs-in-secrets

Created on 27 Nov 2018  路  18Comments  路  Source: kubernetes/kubeadm

kubeadm alpha selfhosting pivot (kubeadm v1.13.0-beta.2) fails when invoked with --store-certs-in-secrets with the following error message:

[pivot] pivoting cluster to self-hosted
[self-hosted] Created TLS secret "ca" from ca.crt and ca.key
[self-hosted] Created TLS secret "apiserver" from apiserver.crt and apiserver.key
[self-hosted] Created TLS secret "apiserver-kubelet-client" from apiserver-kubelet-client.crt and apiserver-kubelet-client.key
[self-hosted] Created TLS secret "sa" from sa.pub and sa.key
[self-hosted] Created TLS secret "front-proxy-ca" from front-proxy-ca.crt and front-proxy-ca.key
[self-hosted] Created TLS secret "front-proxy-client" from front-proxy-client.crt and front-proxy-client.key
[self-hosted] Created secret for kubeconfig file "scheduler.conf"
[self-hosted] Created secret for kubeconfig file "controller-manager.conf"
[apiclient] Found 1 Pods for label selector k8s-app=self-hosted-kube-apiserver
timed out waiting for the condition
areself-hosting help wanted kinbug lifecyclactive

All 18 comments

I'm fine with waiting till 1.14 on this one.

I thought this feature was being removed? The issue is likely due to https://github.com/kubernetes/kubernetes/issues/61322.

@andrewrynhard thanks for pointing this out!

I thought this feature was being removed?

self-hosting was removed from kubeadm init and kubeadm upgrade workflows (both of them in some way not working properly), but it was agreed to leave an alpha command with the pivoting logic that you can call after init; however, be aware that once the cluster is turned to self-hosting you are on your own (e.g. for solving checkpointing / cold restart).

It looks like api server can't start as etc-client certificates are not created/copied:

F1227 16:01:52.237352       1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry [https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true 0xc000884120 <nil> 5m0s 1m0s}), err (open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory)

What I don't understand is why those certificates are not needed when -store-certs-in-secrets is not used.

@neolit123, @fabriziopandini any ideas?

@bart0sh
That's the bug to be fixed. As far as I know when TLS was added to etcd, --store-certs-in-secrets was never updated accordingly. There was a PR for this https://github.com/kubernetes/kubernetes/pull/61323, but as you can see it never landed

So

  • without store certs in secrets, self-hosting works, because deamonset/pods uses existing certificates on disk
  • with store certs in secrets, self-hosting doesn't works, because deamonset/pods didn't find all the necessary certificates in secrets

@fabriziopandini Thank you for the explanations. I tried to add all etcd certificates, but generating secrets for them fails as their names contain '/', e.g. "etcd/ca". Changing names didn't work either as probably some other piece[s] of this puzzle require names with slashes. I'll dig deeper into this. Any ideas on how to better solve this are appreciated.

@bart0sh I see your problem.
kubeadm creates one secret for each cert, and this requires changing the name for certs under etcd/.
If I'm right, this should be done here

but this is not enough, it is necessary to the corresponding volume projection that places the cert in the expected place as well. If I'm right, this should be done here

PS. pay attention to external etcd mode vs local etcd mode

I did change it in both places, but this was not enough. Changes that you've proposed would trigger errors in generating secrets, as secret names should not contain slashes. Changing name in constants from etc/ca to etc-ca would make api server stuck on start.

@bart0sh if you can share the generated yaml for the kube-apiserver deamonset might be I can help...

/assign

/lifecycle active

@fabriziopandini I've got api server running with the above fix. Thank you for the help!

next thing: kube-controller-manager is crashlooping, but kubeadm WaitForPodsWithLabel API manages to catch short moment when it's in running state (not sure how that could be though):

[self-hosted] >>> wait for kube-controller-manager to come up, label k8s-app=self-hosted-kube-controller-manager
[apiclient] Found 0 Pods for label selector k8s-app=self-hosted-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=self-hosted-kube-controller-manager
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Running
[apiclient] The old Pod "kube-controller-manager-ed" is now removed (which is desired)
[apiclient] All control plane components are healthy after 0.000702 seconds
$ kubectl get pods -n kube-system |grep controller
self-hosted-kube-controller-manager-p9tpb   0/1     CrashLoopBackOff   6          8m11s

The reason for CrashLoop doesn't matter yet. The issue is that at some point pod status is 'Running'. Do you have any idea why and how to fix this?

/reopen

@bart0sh: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123 @kad @rosti do you have any idea why this could happen?

@bart0sh if I'm not wrong this is fixed now... can you confirm?

@fabriziopandini not yet. I'm working on it.

btw. Can you answer above question, please?

@bart0sh
I'm not sure that we should implement a logic that detects if a self-hosting pod runs _and then continue to run_... this could potentially lead to a never ending story.

Instead, I think we should investigate why kube-controller-manager is crashlooping, and make sure this condition is not generated by the self-hosting pivoting logic.

Once we are sure the pivoting logic doesn't introduce "regressions" that we can eventually discuss if/how to make the whole process more robust (e.g. by implementig preflight checks or rollback logic).

Was this page helpful?
0 / 5 - 0 ratings