running microk8s inspect does not work as well as talking to the cluster. error is this: x509: certificate has expired or is not yet valid
How can i renew the root cert?
How can i make it last longer than a month?
I've seen this issue happen when the date of the machine changed. The certs duration is atleast 365 days.
but than even after 365 days there needs to be a solution on how to update the root ca, right?
There is a way to force generate the certs, by either changing the csr.conf.template say adding a DNS entry.
I have not personally tried but maybe cert manager can be used here.
Since i am currently not able to use my previous installation of microk8s, i did a reinstall using sudo snap install microk8s --classic --channel=1.18/stable and checked the root ca on the new cluster using openssl x509 -enddate -noout -in /var/snap/microk8s/current/certs/ca.crt
the result: notAfter=May 27 12:57:33 2020 GMT
again only valid for one month
and changing the csr.conf.template only seems to update server.crt and not the ca.crt
You're right, I just checked the code the CA cert is not specified with the -days param when requesting for a cert. I think this defaults to 30 days.
@ktsakalozos the ca.crt and the front-proxy-ca.crt is now 30 days old any particular reason why?
Thanks.
Same happened to my system yesterday. Is there no way to regenerate both ca.crt and server.crt so I can get back in action?
I have not tried this. 馃槉
Maybe we can try this:
microk8s.stop.crt and .key in /var/snap/microk8s/current/certs.csr.conf.template by adding DNS or comment.microk8s.start/var/snap/microk8s/current/certs first.This in an oversight from our part (the missing -days arguments).
In the past hours we have patched the affected tracks, tested and released a new snap. This way any new deployments should not have this issue.
We will continue our work on a fix for the already existing deployments. The approach @balchua suggests seems promising but we will also need to recreate the kubeconfigs below line https://github.com/ubuntu/microk8s/blob/master/snap/hooks/install#L18
Thank you @ThomasSchoenbeck @EzraBrooks for reporting this and @balchua for spotting the issue and offering your help. Apologies for the inconvenience we may have caused.
@ktsakalozos i maybe wrong here, but i think the kubeconfigs are using tokens rather than certs.
I tried what balchua suggested (adding a DNS entry) but couldn't get it working. microk8s does not start.
Bummer. Its probably not regenerating the certs.
The script I have for now is here: https://gist.github.com/ktsakalozos/5de8d4c86c976eeef0242cc39fdf82b2
It would be great if anyone would run it and provide feedback.
curl https://gist.githubusercontent.com/ktsakalozos/5de8d4c86c976eeef0242cc39fdf82b2/raw/f29ff555346435154553d35ff64a8282f867011f/refresh-certs.sh -o refresh.sh
chmod +x refresh.sh
sudo ./refresh.sh
After running the script the pods in the cluster should go into an unknown state and restart after some seconds.
The intention is to place the above script in a microk8s.refresh-certs command to address this issue in affected deployments.
@balchua the kubeconfig files use tokens but they also carry the ca.cert that is why I think they need to be recreated.
@ktsakalozos aaa yes the CADATA needs to be repopulated. Totally overlooked it.
Sorry for the late response. I tested the script above. new certs are valid until the year 2030.
all pods went into unknown state. at around 30 seconds later, all pods went up.
Thanks for the help!
Now that there is a way to refresh the tokens, are we good to close this one?
Thanks
yes. i am closing the ticket! Thanks again for the greate help @balchua and @ktsakalozos
Also noting that the script @ktsakalozos provided fixes the issue for me. Thank you!
Was facing the same issue. The refresh.sh script worked for me. Afterwards I was facing DNS resolution errors. All services would crash with errors similar to
socket.gaierror: [Errno -3] Temporary failure in name resolution
To save others from 2 hours of debugging: Make sure that coredns has 1/1 ready in kubectl -n kube-system get all. Its readiness probe had failed and logs showed
E0524 09:50:35.607082 1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.152.183.1:443/api/v1/endpoints?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
Deleting the pod (forcing it to restart) solved the issue for me.
Got the idea to check kube-system from here and https://github.com/ubuntu/microk8s/issues/332#issue-413517185.
I hit upon the same issue just now, had to run refresh.sh and also had to give the coredns pod a kick, thank you @PeterSR for sharing that.
Everything seems to be back to working order, however I cannot pull an image from a private repo now.
Normal Scheduled 17m default-scheduler Successfully assigned homelab/newimage-66c8d88f65-lhvdz to kube
Normal Pulling 15m (x4 over 17m) kubelet, kube Pulling image "registry.gitlab.com/realg/kube/newimage:20.05"
Warning Failed 15m (x4 over 17m) kubelet, kube Failed to pull image "registry.gitlab.com/realg/kube/newimage:20.05": rpc error: code = Unknown desc = failed to resolve image "registry.gitlab.com/realg/kube/newimage:20.05": no available registry endpoint: failed to fetch anonymous token: unexpected status: 403 Forbidden
Warning Failed 15m (x4 over 17m) kubelet, kube Error: ErrImagePull
Normal BackOff 11m (x21 over 17m) kubelet, kube Back-off pulling image "registry.gitlab.com/realg/kube/newimage:20.05"
Warning Failed 113s (x65 over 17m) kubelet, kube Error: ImagePullBackOff
The image is definitely there, I can pull it with docker from another host using the same dockerconfig.json, I haven't made any other changes to my cluster so that has me thinking that it's related to refreshing the expired certs.
Has anyone had the same issue?
@realG
I actually had problems pulling images as well for a specific deployment, but then I realized that I had forgotten
imagePullSecrets:
- name: regcred
for that specific one and it was therefore unrelated to this issue. But I did recreate the regcred secret from scratch in an attempt to fix it, so who knows.
@PeterSR thank you for the tip yet again! I was indeed missing imagePullSecrets in the deployment yaml.
Most helpful comment
The script I have for now is here: https://gist.github.com/ktsakalozos/5de8d4c86c976eeef0242cc39fdf82b2
It would be great if anyone would run it and provide feedback.
After running the script the pods in the cluster should go into an unknown state and restart after some seconds.
The intention is to place the above script in a
microk8s.refresh-certscommand to address this issue in affected deployments.@balchua the kubeconfig files use tokens but they also carry the ca.cert that is why I think they need to be recreated.