User Story
As a developer/user/operator I would like to upgrade to a newer version of cert-manager to stay up to date.
Detailed Description
We are currently using cert-manager 0.11. The current version is 0.13, and it has some internal changes that we'll have to accommodate (there is no longer an apiserver, so how we wait for cert-manager to be up in clusterctl and tilt will have to change). In their upcoming 0.14, they'll also be introducing the v1alpha3 api version, and it's always good to stay on top of things, especially for frequent alpha updates.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
/kind feature
/priority important-longterm
/milestone Next
Do we need to also try and push this change into kubebuilder as well?
Sure (this is longterm so I'm just writing it down so we don't forget - no need to spend any energy thinking about it right now 馃槃)
ok. however, this opens the issue of clusterctl handling the cert-manager upgrade...
This is still opened or someone already took that? Otherwise I would like to try
xref https://github.com/kubernetes-sigs/cluster-api/issues/2635 for the clusterctl bits
I saw in the code it checks for the API service but that does not exist in the version 0.13.x, should we instead of checking for that wait for the deployment and check if the service is responding?
what are the guidelines from the team?
thx
@munnerz do you have any guidance for how to check for cert-manager readiness in newer releases?
The easiest way is probably as @cpanato describes - checking if the webhook pod is running.
That said, there are still edge-cases (i.e. CAs not being configured properly for whatever reason) that could cause a request to fail - Kubernetes doesn't expose a clear way to check whether all of mutating, validating and conversion webhooks are 'ready'. I think the closest thing to that is performing a kubectl create --dry-run and waiting until you get a valid response.
So.. the quick way is to check the webhook pod. The 'proper' way would be to attempt to make a request to the apiserver to ensure that the validating, mutating and conversion webhook are operational. In 99% of cases, these will be on and the same (within a few ms at least!). Eventual consistency is hard 馃檭
I have used something like: kubectl wait -n cert-manager --for=condition=Available deploy --all in the past, which gives a pretty good indicator :)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/lifecycle frozen
I have used something like:
kubectl wait -n cert-manager --for=condition=Available deploy --allin the past, which gives a pretty good indicator :)
@fabriziopandini or @wfernandes have you had a chance to experiment with this alternative way of waiting for cert-manager?
Would it be possible to add a flag to clusterctl init that opts out of having clusterctl manage cert-manager?
I'm attempting to use features only available in cert-manager 0.13+, and would like the option to install cert-manager myself to work around this issue.
@wmgroot All components shipped with Cluster API and providers include cert-manager certificates that are compatible with 0.11. Having a different version would make the installation of those components fail. We're looking to migrate the cert-manager version in a future release.
@vincepri are we creating anything related to cert-manager that is incompatible with newer versions? I wouldn't think so?
@voor Unfortunately, I haven't had the time to dive into the newer versions of cert-manager and the implications of clusterctl installation and upgrades yet. As @fabriziopandini mentioned, this does get into the realm of clusterctl managing the lifecycle of cert-manager so this work feels substantial.
@ncdc The certificate creation failed using manifests from v1alpha2 with 0.13 controllers, this was a long time ago. Happy to be proved wrong, if that works it might help the transition in the future
@ncdc The certificate creation failed using manifests from v1alpha2 with 0.13 controllers, this was a long time ago. Happy to be proved wrong, if that works it might help the transition in the future
I'm interested to see where you saw this failure, we aim to ensure we don't make backward compatible changes within an API version in cert-manager.
We have a conversion webhook that manages conversions between API versions, which should be being deployed automatically for you anyway too.
I may have some time to dig into this over the weekend/next week - what's the best way for me to verify that an upgrade keeps working?
Also, does clusterapi upgrade cert-manager components in existing clusters? If so, is there any sort of upgrade process/code-path that needs looking at?
Adding this here for context.
v0.11 -> v0.12 is pretty much the APIService being removed.
v0.12 -> v0.13 nothing
v0.13 -> v0.14 https://cert-manager.io/docs/installation/upgrading/upgrading-0.13-0.14/
Also, does clusterapi upgrade cert-manager components in existing clusters? If so, is there any sort of upgrade process/code-path that needs looking at?
Not currently. That's the main reason I filed this issue 馃槃
I'm interested to see where you saw this failure, we aim to ensure we don't make backward compatible changes within an API version in cert-manager.
We have a conversion webhook that manages conversions between API versions, which should be being deployed automatically for you anyway too.
This might have been the issue, the webhook wasn't responding but that might have been an isolated issue. We need to do a little more validation.
I may have some time to dig into this over the weekend/next week - what's the best way for me to verify that an upgrade keeps working?
If we provide an option to disable installing cert-manager w/ clusterctl during it, you should be able to create a kind cluster, install 0.13 and run clusterctl init.
Also, does clusterapi upgrade cert-manager components in existing clusters? If so, is there any sort of upgrade process/code-path that needs looking at?
Today, we only install cert-manager and don't handle its lifecycle, there is some work planned for clusterctl (or a different controller) to do so #2635
Also, does clusterapi upgrade cert-manager components in existing clusters? If so, is there any sort of upgrade process/code-path that needs looking at?
@munnerz Currently clusterctl only installs cert-manger v0.11 (with some specific updates). It does not handle any upgrade for cert-manager components. See https://github.com/kubernetes-sigs/cluster-api/issues/2635 for more info.
Jinx to all of us 馃槃.
I think adding the ability to opt-out of cert-manager in clusterctl might be ok, but we will need to very explicit about which cert-manager versions we've tested against, and anything else is YMMV.
I'm working on this at the moment.
Initially, I'm intending to upgrade cert-manager to v0.15.x (or perhaps v0.16.0 depending on timing of the next CAPI release).
Rough plan for this first phase:
1) Upgrade the embedded manifests to use the regular upstream manifests taken from the CM release page.
2) Updating the "wait for cert-manager readiness" logic to take a more 'blackbox' approach (i.e. not relying on how cert-manager is deployed in order to check whether the API is ready, and instead following the Verifying the installation instructions published on our website. This should not require changes, as it is a true 'round trip' of our Kubernetes API.
In terms of managing upgrades, are there any expectations here? I understand it is "not supported" right now, but if a capi user runs clusterctl init against an existing management cluster with this new version, what should happen? Would we expect cert-manager to be left as-is if already installed? Upgraded?
In terms of managing upgrades, are there any expectations here? I understand it is "not supported" right now, but if a capi user runs clusterctl init against an existing management cluster with this new version, what should happen? Would we expect cert-manager to be left as-is if already installed? Upgraded?
I suspect at some point, if we start using newer versions of the types we'll want to also upgrade the certmanager for existing management clusters as part of the clusterctl upgrade process.
What is the general recommendation for cert-manager there? Are in-place kubectl update of the manifests considered the appropriate way?
It might be good to double check that we don't orphan any stored resources at an unsupported api version over the course of multiple upgrades, not sure if there is anything that cert-manager does or recommends to ensure that after an upgrade existing resources are updated to force an update of the storage version for the resource.
What is the general recommendation for cert-manager there? Are in-place kubectl update of the manifests considered the appropriate way?
Typically, yes. Historically we've had a few additional steps that have been needed. We are aiming to get to a world where that's not the case, and from my point of view I don't see where we'll need to making breaking changes in future.
That said, it's not unheard of, so perhaps does not some special handling. We've look at creating an operator for cert-manager in the past to help work through this, but it becomes an operator-operator kind of problem then...
kubectl apply does/should work, although it doesn't handle the case where we have deleted a resource, which in the case of the APIService, can cause big issues. We now avoid APIService resources, however Validating & Mutating webhooks could cause similar problems. That said, we're not likely to be removing these resources...
It might be good to double check that we don't orphan any stored resources at an unsupported api version over the course of multiple upgrades
馃憤 this is an issue our community is yet to run into, as (aside from the 'awkward' v1alpha1), we still support all API versions through conversion webhooks. We have not spent too much time specifically tackling this area for cert-manager however, as there _is_ a KEP in place to address this universally: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/0030-storage-migration.md (and corresponding implementation: https://github.com/kubernetes-sigs/kube-storage-version-migrator).
We've not yet really discussed timelines for removing these alpha versions altogether yet, although this _is_ an important discussion to have. Needless to say we'll not make drastic or sudden changes, but I imagine at _some point_ we'll want to remove all but our v1 version, in a similar manner to kubernetes/kubernetes.
The only other thing to note here, is that cert-manager technically comes in two 'flavours' - the 'legacy' variants are intended to be used on Kubernetes 1.14 and below, whereas the 'regular' manifests are intended for newer versions. This is caused by the strict requirements around CRDs, and the tight validation that's been added to various fields on CRD objects as they have matured to v1.
Is there a specific list of supported k8s versions provided?
Minimum version for the management cluster is v1.16
For management clusters (where we deploy cert-manager), the minimum requirement is Kubernetes v1.16 for CRD v1
Most helpful comment
I'm working on this at the moment.
Initially, I'm intending to upgrade cert-manager to v0.15.x (or perhaps v0.16.0 depending on timing of the next CAPI release).
Rough plan for this first phase:
1) Upgrade the embedded manifests to use the regular upstream manifests taken from the CM release page.
2) Updating the "wait for cert-manager readiness" logic to take a more 'blackbox' approach (i.e. not relying on how cert-manager is deployed in order to check whether the API is ready, and instead following the Verifying the installation instructions published on our website. This should not require changes, as it is a true 'round trip' of our Kubernetes API.
In terms of managing upgrades, are there any expectations here? I understand it is "not supported" right now, but if a capi user runs
clusterctl initagainst an existing management cluster with this new version, what should happen? Would we expect cert-manager to be left as-is if already installed? Upgraded?