Describe the bug:
Upgrading a namespace-scoped (i.e. uses Issuers, not ClusterIssuers) cert-manager from v0.7.2 to v0.8.1, the challenges controller now tries to access cluster scoped secrets. We forbid this in RBAC policy, which catches the change in cert-manager's behavior (btw v0.7.2 and v0.8.0-alpha.0 actually):
E0629 21:36:22.953032 1 reflector.go:131] vendor/k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:cert-manager:cert-ma
nager" cannot list resource "secrets" in API group "" at the cluster scope
Does cert-manager challenges have new/changed code that doesn't respect the namespace scope configuration? Is there a new configuration parameter we need now to get it to be namespace-scope?
Expected behaviour:
cert-manager is specifically configured to run only a few controllers and use only a specific namespace. It should not access anything at cluster scope in this configuration (repro example below).
Steps to reproduce the bug:
Here is the most minimal reproducible example with just the challenges controller (not very practical, but shows the problem)
apiVersion: apps/v1
kind: Deployment
metadata:
name: cert-manager
namespace: cert-manager
spec:
replicas: 1
selector:
matchLabels:
name: cert-manager
template:
metadata:
labels:
name: cert-manager
spec:
serviceAccountName: cert-manager
containers:
- name: cert-manager
image: quay.io/jetstack/cert-manager-controller:v0.8.0-alpha.0
args:
- --namespace=$(POD_NAMESPACE)
- --leader-election-namespace=$(POD_NAMESPACE)
- --cluster-resource-namespace=$(POD_NAMESPACE)
- --controllers=challenges
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
Such pods will spew the RBAC error listed above (for repro, you could use a SA with not permissions). Change the version back to v0.7.2 and the challenges controller will start.
Anything else we need to know?:
Typically, we only use issuers,certificates,orders,challenges controllers and do not use the webhook component. This should narrow the search space of possible regression points.
git log --oneline v0.7.2..v0.8.0-alpha.0 -- pkg/controller/acmechallenges
bbf4012e Handle expired challenge responses in acmechallenges controller
57075123 Merge pull request #1585 from munnerz/validate-caa-feature-gate
49f587c8 Set Reason field on ACME challenges during Present/CleanUp
9906c0d9 Add feature gate for ValidateCAA functionality and default it to off
af9bce72 Add 'webhook' DNS01 provider type
871ed428 Allow controller constructors to return errors
eaeefdf5 Update acmechallenges controller
/kind bug
Doing some bisect tests of the images cert-manager publishes between releases,
Seems like maybe the New DNS solver feature accesses cluster-level secrets https://github.com/jetstack/cert-manager/compare/076ecb4e...f3910e0d Also, I guess not a lot of users lock down cert-manger to a namespace, or if they do, they also haven't updated to v0.8.x yet.
Thanks for the in-depth description and analysis of this problem. You are correct I think in suggesting it's due to the way the new webhook providers are working.
Specifically you can see here where the problematic informer is instantiated.
We will need to work out how best to plumb through the namespace parameter to the DNS solver's Initialize function so this can be appropriately filtered.
/milestone v0.9
/priority important-soon
/area acme
I've opened #1849 which should fix this 馃槃
Thanks for your work and the fix!