Please check https://cert-manager.io/docs/installation/upgrading/upgrading-0.15-0.16/
Describe the bug:
When kubectl perfoms an apply on the customresourcedefinition.apiextensions.k8s.io when the 0.16.0 CRDs are installed it gets stuck without any error.
This happens specifically on applying changes to Challengers ClusterIssuers Issuers.
When testing the legacy manifests (aka only v1alpha2) everything seemed to work, when modifying these to only have v1beta1 I also saw success.
Looking into that the error was not related to the API definition. Is this a length overflow? I created a CRD manifest with only v1alpha2 and v1beta1, that causes issues.
This lead me to the openAPIV3Schema part of the CRD. When I create a CRD where openAPIV3Schema is on the root and not per version it all passes. It seems a different validation schema per version is causing issues when doing a kubectl apply on those resources. This is not per say a cert-manager bug. But it is something we should work around as a fix might not reach everyone in time.
(done this all in the filter-crd script)
Steps to reproduce the bug:
kind create cluster
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0/cert-manager.crds.yaml
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0/cert-manager.crds.yaml # again
This also seems to be present in Kubernetes 1.19-beta.2
/kind bug
Digging deeper reverting changes to the v1beta1 API to pinpoint what change that caused this gave no results.
So I copied the schema from v1alpha2 into v1beta1 and changed 1 letter (to avoid validation needs to be at the root error) and it caused the same issue.
Why don't we have this on certain resources? And why didn't this happen when we added v1alpha3. The structures listed all have something to do with ACME (orders is a weird one... it is broken in alpha.0 but not alpha.1 so this is interesting..) which creates very long CRDs as the complexity of it.
Why not in v1alpha3? Well ACME didn't have changes in that one so the validation is on the root: https://github.com/jetstack/cert-manager/pull/3038/files#diff-416ccc7c710ad3c389f9d2b31325f037
Seems the Kubernetes API server has issues processing changes to the very large openapi schemas between versions.
If you're hitting this, here is a crappy workaround: (!!! this will delete your issuers!)
kubectl delete crd challenges.acme.cert-manager.io
kubectl delete crd clusterissuers.cert-manager.io
kubectl delete crd issuers.cert-manager.io
kubectl delete crd orders.acme.cert-manager.io
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0/cert-manager.crds.yaml
As suggested in https://kubernetes.slack.com/archives/C0EG7JC6T/p1595703000219900?thread_ts=1595702151.219700&cid=C0EG7JC6T it might have been https://github.com/kubernetes/kubernetes/issues/82292
23
So I created a CRD without any description inside of it, reducing from 298kb to 91kb. However the issue is still present.
Further digging into this it is caused by https://github.com/kubernetes/kubernetes/issues/91615 and patched in https://github.com/kubernetes/kubernetes/pull/92069 to be backported to all version starting at 1.16
We mitigated the situation for now by commenting out the podTemplate section of the Challengers, ClusterIssuers and Issuers CRDs. Pending a proper fix patched into k8s.
/area deploy
/priority important-soon
Added a milestone, it is not something we seem to be able to fix in our codebase but this should be kept in mind for the v1.0 release
upstream fixes are merged and new 1.16, 1.17 and 1.18 minor versions were just released.
I checked out kubectl from all of them and they all work now :)
PR to fix Helm: https://github.com/helm/helm/pull/8595
Helm 3.3.1 is out!
/close
@meyskens: Closing this issue.
In response to this:
Helm 3.3.1 is out!
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
upstream fixes are merged and new 1.16, 1.17 and 1.18 minor versions were just released.
I checked out kubectl from all of them and they all work now :)