Cert-manager: Missing nodeSelector on challenge pods

Created on 22 Jan 2021  ยท  5Comments  ยท  Source: jetstack/cert-manager

I'm trying to get cert-manager working with Traefik in an AKS cluster running in Azure. I suppose this will work fine in an ordinary cluster - but our cluster has Windows nodes as well as Linux nodes.

And this doesn't work very well as Kubernetes tries to schedule cert-manager pods on a Windows host.

I have installed cert-manager using the current Helm chart. At first installation failed - but when I added three node selectors for the three pods, installation worked fine.

A node selector like

nodeSelector:
  beta.kubernetes.io/os: linux

is needed to ensure that pods are scheduled on a Linux node, not a Windows node.

But even though I've added this selector everywhere I can, there seems to be no way to add the same selector to the ACME challenge pods that cert-manager creates. They are created without selector - and my Kubernetes cluster then tries to schedule the pods on an idle Windows node.

Please add the capability to include a nodeSelector for the challenge pods as well.

/kind bug

kinbug triagsupport

Most helpful comment

While I agree this should be fixed by default, there is a way of setting this in the current release.

The spec.acme.solvers[].http01.ingress.podTemplate.spec.nodeSelector field can be set in the Issuer / ClusterIssuer object to add this nodeSelector. The helm chart isn't doing this because it doesn't create any issuers, that's left to the user. See https://cert-manager.io/docs/configuration/acme/http01/#podtemplate

Hopefully that provides a better way of working around this right now, than setting up taints etc.

All 5 comments

To emphasize the problem: I've created a workaround for now by tainting all Windows nodes.

This works - but it's not the way to go. The proper solution would still be to add the right nodeSelector. I actually believe this can be hardcoded - cert-manager will always need a Linux node, won't it?

I'm facing the exact same issue. Installed with helm chart v1.1.0 and set nodeSelector: kubernetes.io/os: linux however this nodeselector does not get applied to the solver pods that are created for acme:

โฏ kubectl get pods
NAME                                   READY   STATUS             RESTARTS   AGE
cm-acme-http-solver-kltk6              0/1     ImagePullBackOff   0          4m33s
โฏ kubectl describe pod cm-acme-http-solver-kltk6
Name:         cm-acme-http-solver-kltk6
...
Node-Selectors:  <none>
...
Events:
  Type     Reason          Age                     From               Message
  ----     ------          ----                    ----               -------
  Normal   Scheduled       7m44s                   default-scheduler  Successfully assigned kubernetes-dashboard/cm-acme-http-solver-kltk6 to akswipool000000
  Normal   SandboxChanged  7m24s (x4 over 7m40s)   kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling         6m54s (x3 over 7m41s)   kubelet            Pulling image "quay.io/jetstack/cert-manager-acmesolver:v1.1.0"
  Warning  Failed          6m53s (x3 over 7m40s)   kubelet            Failed to pull image "quay.io/jetstack/cert-manager-acmesolver:v1.1.0": rpc error: code = Unknown desc = no matching manifest for windows/amd64 10.0.17763 in the manifest list entries
  Warning  Failed          2m40s (x24 over 7m36s)  kubelet            Error: ImagePullBackOff

So I get no matching manifest for windows/amd64 becuase it was assigned to a windows node.

The same issue existed in the chart for ingress-nginx but there they have now set it default to linux becuase it can only run on linux. I would agree that it could be the same here too unless cert-manager is planning to support running on windows nodes.

EDIT: To be clear dynamically created solver pods needs to have the nodeselector applied to them somehow. Setting the node selector to linux as default in the helm chart alone will not solve the issue :-).

While I agree this should be fixed by default, there is a way of setting this in the current release.

The spec.acme.solvers[].http01.ingress.podTemplate.spec.nodeSelector field can be set in the Issuer / ClusterIssuer object to add this nodeSelector. The helm chart isn't doing this because it doesn't create any issuers, that's left to the user. See https://cert-manager.io/docs/configuration/acme/http01/#podtemplate

Hopefully that provides a better way of working around this right now, than setting up taints etc.

Sorry I overlooked that config. I'll test it as quickly as possible.

Nevertheless, I think all of us agree that this should be set by default?

Hi!

The nodeSelector field in the Helm chart seems to be made for the cert-manager's deployment itself (as opposed to the challenge pods which are created dynamically using the Issuer/ClusterIssuer configuration).

Might be worth adding this bit of information to the nodeSelector field in the helm chart's README.md, e.g.:

diff --git a/deploy/charts/cert-manager/README.template.md b/deploy/charts/cert-manager/README.template.md
index a6515327d..77c739af8 100644
--- a/deploy/charts/cert-manager/README.template.md
+++ b/deploy/charts/cert-manager/README.template.md
@@ -101,7 +101,7 @@ The following table lists the configurable parameters of the cert-manager chart
 | `securityContext` | Optional security context. The yaml block should adhere to the [SecurityContext spec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#securitycontext-v1-core) | `{}` |
 | `securityContext.enabled` | Deprecated (use `securityContext`) - Enable security context | `false` |
 | `containerSecurityContext` | Security context to be set on the controller component container | `{}` |
-| `nodeSelector` | Node labels for pod assignment | `{}` |
+| `nodeSelector` | Node labels for the assignment of the cert-manager controller. This field does not influence the assignment of ACME challenge pods; you can set the nodeSelector for the ACME challenge pods in the Issuer/ClusterIssuer object's `spec.acme.solvers[].http01.ingress.podTemplate.spec.nodeSelector` field. | `{}` |
 | `affinity` | Node affinity for pod assignment | `{}` |
 | `tolerations` | Node tolerations for pod assignment | `[]` |
 | `ingressShim.defaultIssuerName` | Optional default issuer to use for ingress resources |  |

What do you think?

/triage support

Was this page helpful?
0 / 5 - 0 ratings

Related issues

munjal-patel picture munjal-patel  ยท  3Comments

jbartus picture jbartus  ยท  4Comments

dontreboot picture dontreboot  ยท  3Comments

jbeda picture jbeda  ยท  4Comments

cpick picture cpick  ยท  3Comments