I'm trying to get cert-manager working with Traefik in an AKS cluster running in Azure. I suppose this will work fine in an ordinary cluster - but our cluster has Windows nodes as well as Linux nodes.
And this doesn't work very well as Kubernetes tries to schedule cert-manager pods on a Windows host.
I have installed cert-manager using the current Helm chart. At first installation failed - but when I added three node selectors for the three pods, installation worked fine.
A node selector like
nodeSelector:
beta.kubernetes.io/os: linux
is needed to ensure that pods are scheduled on a Linux node, not a Windows node.
But even though I've added this selector everywhere I can, there seems to be no way to add the same selector to the ACME challenge pods that cert-manager creates. They are created without selector - and my Kubernetes cluster then tries to schedule the pods on an idle Windows node.
Please add the capability to include a nodeSelector for the challenge pods as well.
/kind bug
To emphasize the problem: I've created a workaround for now by tainting all Windows nodes.
This works - but it's not the way to go. The proper solution would still be to add the right nodeSelector. I actually believe this can be hardcoded - cert-manager will always need a Linux node, won't it?
I'm facing the exact same issue. Installed with helm chart v1.1.0 and set nodeSelector: kubernetes.io/os: linux however this nodeselector does not get applied to the solver pods that are created for acme:
โฏ kubectl get pods
NAME READY STATUS RESTARTS AGE
cm-acme-http-solver-kltk6 0/1 ImagePullBackOff 0 4m33s
โฏ kubectl describe pod cm-acme-http-solver-kltk6
Name: cm-acme-http-solver-kltk6
...
Node-Selectors: <none>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m44s default-scheduler Successfully assigned kubernetes-dashboard/cm-acme-http-solver-kltk6 to akswipool000000
Normal SandboxChanged 7m24s (x4 over 7m40s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulling 6m54s (x3 over 7m41s) kubelet Pulling image "quay.io/jetstack/cert-manager-acmesolver:v1.1.0"
Warning Failed 6m53s (x3 over 7m40s) kubelet Failed to pull image "quay.io/jetstack/cert-manager-acmesolver:v1.1.0": rpc error: code = Unknown desc = no matching manifest for windows/amd64 10.0.17763 in the manifest list entries
Warning Failed 2m40s (x24 over 7m36s) kubelet Error: ImagePullBackOff
So I get no matching manifest for windows/amd64 becuase it was assigned to a windows node.
The same issue existed in the chart for ingress-nginx but there they have now set it default to linux becuase it can only run on linux. I would agree that it could be the same here too unless cert-manager is planning to support running on windows nodes.
EDIT: To be clear dynamically created solver pods needs to have the nodeselector applied to them somehow. Setting the node selector to linux as default in the helm chart alone will not solve the issue :-).
While I agree this should be fixed by default, there is a way of setting this in the current release.
The spec.acme.solvers[].http01.ingress.podTemplate.spec.nodeSelector field can be set in the Issuer / ClusterIssuer object to add this nodeSelector. The helm chart isn't doing this because it doesn't create any issuers, that's left to the user. See https://cert-manager.io/docs/configuration/acme/http01/#podtemplate
Hopefully that provides a better way of working around this right now, than setting up taints etc.
Sorry I overlooked that config. I'll test it as quickly as possible.
Nevertheless, I think all of us agree that this should be set by default?
Hi!
The nodeSelector field in the Helm chart seems to be made for the cert-manager's deployment itself (as opposed to the challenge pods which are created dynamically using the Issuer/ClusterIssuer configuration).
Might be worth adding this bit of information to the nodeSelector field in the helm chart's README.md, e.g.:
diff --git a/deploy/charts/cert-manager/README.template.md b/deploy/charts/cert-manager/README.template.md
index a6515327d..77c739af8 100644
--- a/deploy/charts/cert-manager/README.template.md
+++ b/deploy/charts/cert-manager/README.template.md
@@ -101,7 +101,7 @@ The following table lists the configurable parameters of the cert-manager chart
| `securityContext` | Optional security context. The yaml block should adhere to the [SecurityContext spec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#securitycontext-v1-core) | `{}` |
| `securityContext.enabled` | Deprecated (use `securityContext`) - Enable security context | `false` |
| `containerSecurityContext` | Security context to be set on the controller component container | `{}` |
-| `nodeSelector` | Node labels for pod assignment | `{}` |
+| `nodeSelector` | Node labels for the assignment of the cert-manager controller. This field does not influence the assignment of ACME challenge pods; you can set the nodeSelector for the ACME challenge pods in the Issuer/ClusterIssuer object's `spec.acme.solvers[].http01.ingress.podTemplate.spec.nodeSelector` field. | `{}` |
| `affinity` | Node affinity for pod assignment | `{}` |
| `tolerations` | Node tolerations for pod assignment | `[]` |
| `ingressShim.defaultIssuerName` | Optional default issuer to use for ingress resources | |
What do you think?
/triage support
Most helpful comment
While I agree this should be fixed by default, there is a way of setting this in the current release.
The
spec.acme.solvers[].http01.ingress.podTemplate.spec.nodeSelectorfield can be set in theIssuer/ClusterIssuerobject to add this nodeSelector. The helm chart isn't doing this because it doesn't create any issuers, that's left to the user. See https://cert-manager.io/docs/configuration/acme/http01/#podtemplateHopefully that provides a better way of working around this right now, than setting up taints etc.