Cert-manager: Can't delete challenge

Created on 24 Apr 2019  路  16Comments  路  Source: jetstack/cert-manager

Describe the bug:
I can't delete a (pending) challenge.

Expected behaviour:
I should be able to delete it without having to remove the finalizers.

Steps to reproduce the bug:
Not sure it's reproducible, in my case it was a namespace that got created and soon later I deleted all resources one by one (not by deleting the namespace itself), so maybe that got it into this state.

Environment details::

  • Kubernetes version (e.g. v1.10.2): 1.11.2
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): AWS/custom
  • cert-manager version (e.g. v0.4.0): 0.7.0
  • Install method (e.g. helm or static manifests): static manifest

/kind bug

kinbug

Most helpful comment

@tellisnz-shift, thanks!

kubectl patch crd challenges.certmanager.k8s.io -p '{"metadata":{"finalizers": []}}' --type=merge

or

kubectl patch crd challenges.acme.cert-manager.io -p '{"metadata":{"finalizers": []}}' --type=merge

works fine.

All 16 comments

馃 are you able to put together a reproduction? Cert-manager should remove the finalizer automatically after attempting to clean up the challenge resource.

Is cert-manager running? Can you provide logs from around the time you deleted the resource?

This might help:
E0424 15:40:20.423510 1 controller.go:125] Error getting Order "xxx-1251750797" referenced by resource "xxx-1251750797-0"

I probably deleted the order before I deleted the challenge. Deleting the order should have updated the challenge references though, right?

I can confirm that this is still a problem with v0.7.2. I probably should just delete the namespace and leave it to k8s to clean up the resources.

I am hitting the same issue. Is there a way to force delete this resource? kubectl wont let me let go of it

If someone can put together a repro for this, or a dump of logs and all the Certificate/Order/Challenge/Issuer then I can dig into what's going on. Otherwise it's very difficult to get started working out where the issues may lie 馃槃

The problem is that when it happens you have already deleted everything but the challenge.
What would help most is documentation about how to remove cert-manager. Personally I did this:

  • k delete namespace cert-manager
  • k delete clusterissuer <cluster issuer>
  • k delete order <order>
  • k delete challenge <challenge>
  • k delete certificate <certificate>

and the Challenge remains, and btw so do the custom resources definitions.
Could be because it is a ClusterIssuer, or because of the deletion order.

I was able to delete it after editing the challenge and removing the finalizer.

That just happened again but this time I only deleted the namespace and had k8s clean up resources. So now the namespace is stuck deleting, the only resource available being the challenge. Deleting that doesn't work either.

@tellisnz-shift Removing the finalizer isn't a solution though since it means whatever it's really stuck on deleting will just be orphaned and left behind.

We've now added uninstallation documentation here: https://docs.cert-manager.io/en/latest/tasks/uninstall/kubernetes.html

If you follow these instructions and make sure that no cert-manager resources exist before uninstalling cert-manager, you won't run into issues with finalization 馃槃

@munnerz This isn't about uninstalling cert-manager (@jdelafon just somewhat hijacked the issue..), this is about deleting a namespace that has a certificates (challenges and orders) where the challenges (and hence the namespace) gets stuck in terminating.

@tellisnz-shift, thanks!

kubectl patch crd challenges.certmanager.k8s.io -p '{"metadata":{"finalizers": []}}' --type=merge

or

kubectl patch crd challenges.acme.cert-manager.io -p '{"metadata":{"finalizers": []}}' --type=merge

works fine.

I just ran into this while upgrading to v0.11.0.
The easier route (for me) was to:

  1. Reinstall your OLD version of the CRDs and the matching cert-manager.
  2. (Optional) Manually delete the Challenge(s).
  3. Follow the cert-managers uninstall instructions (linked above), in-order, from the beginning. This should remove all resources based on cert-manager's CRDs, include all Challenges.

Does anyone of you actually read the description? It's quite annoying tbh.

  • This isn't about uninstalling the cert-manager with challenges left behind etc
  • Removing the finalizer means you're just ignoring the underlying problem

@discordianfish

In the majority of cases I've seen, it is a symptom of not following the install/upgrade instructions, which, at this point REQUIRES uninstalling cert-manager + all CRDs and CRs. The only other case I know of is the lack of a firewall rule allowing communication from the master IP range to the webhook port on the nodes (GKE specific issue). I could also believe a mismatch of the CRDs and cert-manager versions might also cause something like this. Currently, there is nothing in the the description or comments of this issue to indicate it is actually a software bug, as opposed to a network or cluster misconfiguration. If you have any more info to add to differentiate this issue, then maybe we could be of more help. But, at the moment, there isn't anyway for us to be able to eliminate configuration issues in your cluster or network as a possible cause. And following the uninstall/upgrade instructions will ensure the correct config/state for a number of components.

Have you looked through your k8s logs for references to the finalizer failing? This should always be the first step in diagnosing a CR that hangs on deletion.

Are you still running cert-manager v0.7.0 (which is blocked by letsencrypt)? If so, have you tried upgrading the CRDs + cert-manager to v0.8.0+?

Again, note that upgrading to v0.11.0 (current version as of this writing), requires a full uninstall of cert-manager, updating of the cert-manager API name on all (Cluster)Issuers, and Certificates, and updating the annotations on any Ingress definitions you might have.
Please read the following pages:
https://docs.cert-manager.io/en/latest/tasks/upgrading/upgrading-0.7-0.8.html
https://docs.cert-manager.io/en/latest/tasks/upgrading/upgrading-0.8-0.9.html
https://docs.cert-manager.io/en/latest/tasks/upgrading/upgrading-0.9-0.10.html
https://docs.cert-manager.io/en/latest/tasks/upgrading/upgrading-0.10-0.11.html

kubectl patch crd challenges.acme.cert-manager.io -p '{"metadata":{"finalizers": []}}' --type=merge for v1.0.0+

This saved my butt, going to reference here: https://github.com/kubernetes/kubernetes/issues/60538

Was this page helpful?
0 / 5 - 0 ratings