Describe the bug:
Not a single certificate gets issued when Dns01 is used. Cert Manager's Pod never get past the propagation check:
$ kubectl logs -f -n cert-manager cert-manager-6f68b58796-z29bq --since 1s
I0401 17:57:46.366191 1 controller.go:206] challenges controller: syncing item 'namespace/example-com-4104471611-0'
I0401 17:57:46.366468 1 logger.go:103] Calling Discover
I0401 17:57:46.370909 1 dns.go:101] Checking DNS propagation for "example.com" using name servers: [8.8.8.8:53]
I0401 17:57:46.502108 1 sync.go:173] propagation check failed: DNS record for "example.com" not yet propagated
I0401 17:57:46.502588 1 controller.go:212] challenges controller: Finished processing work item "namespace/example-com-4104471611-0"
Expected behaviour:
At least it should identify the record and move on...
The record is always successfully created, as shown below:
$ aws route53 list-resource-record-sets --hosted-zone-id XXXXXXXXXXXXX
{
"ResourceRecordSets": [
// [...]
{
"Name": "example.com.",
"Type": "A",
"TTL": 300,
"ResourceRecords": [
{
"Value": "XXX.XXX.XXX.XXX"
}
]
},
{
"Name": "_acme-challenge.example.com.",
"Type": "TXT",
"TTL": 10,
"ResourceRecords": [
{
"Value": "\"ZWPpajuQACXP4m7giks2S8fe9KSz1TrzLUn_T5HfSwg\""
}
]
}
]
}
Nevertheless, cert-manager has been in a loop for 2 hours supposedly trying to get this information.
Steps to reproduce the bug:
Follow the tutorial:
cert-manager-v0.7.0ClusterIssuer---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-stage
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-stage
http01: {}
dns01:
providers:
- name: route53-dns
cnameStrategy: "Follow"
route53:
accessKeyID: XXXXXXXXXXXXXXXXXXXX
region: sa-east-1
hostedZoneID: XXXXXXXXXXXXX
secretAccessKeySecretRef:
name: route53-credentials-secret
key: secret-access-key
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: example-com
namespace: namespace
spec:
secretName: example-com-tls
issuerRef:
name: letsencrypt-stage
kind: ClusterIssuer
commonName: example.com
dnsNames:
- example.com
acme:
config:
- dns01:
provider: route53-dns
domains:
- example.com
Anything else we need to know?:
ACME Http01 Works fine, but I need a wildcard certificate. The example above yields the same results I get whenever I try to add '*.example.com' to the Certificate API Object, hence the simplified version I've shown.
This was happening since at least v0.5.0, took me a while to report...
Environment details::
/kind bug
Could you try hopping onto the machine that runs the cert-manager pod and see if it can reach the _acme-challenge.example.com record? Propagation checks in cert-manager are done via DNS and split horizon setups can cause this check to fail
Apparently, the Pod uses Busybox's nslookup, which doesn't accept arguments to query TXT records...
Anyhow:
$ nslookup example.com
nslookup: can't resolve '(null)': Name does not resolve
Name: example.com
Address 1: AAA.BBB.CCC.DDD DDD.CCC.BBB.AAA.bc.googleusercontent.com
I have no idea how to interpret these results... Not sure where DDD.CCC.BBB.AAA.bc.googleusercontent.com came from, or why it did not resolve...
By the way, this IP is the exact reverse of the A record for example.com. in Route53.
I have the same setup (Route53 and GKE) and have exactly the same issue:
# On the node and the cert-manager pod
$ nslookup _acme-challenge.example.com
nslookup: can't resolve '(null)': Name does not resolve
nslookup: can't resolve '_acme-challenge.example.com': Name does not resolve
I have other wildcard certs already setup with version prior to 0.5 and they still work.
I killed the cert-manager pod and then it worked again.
I had another issue and in the end I reinstalled cert-manager version 0.6.2 completely. But I think it worked already after killing the pod.
@peterfication did you use stable or jetstack when you installed 0.6.2?
Also, did you use Helm?
I used helm with the jetstack repo version 0.6.0.
have similar problem, TXT record with dns challenge is created on Route53, but it has short TTL: 10sec
running cert-manager with --dns01-recursive-nameservers-only=true so dns propagration would not recursively go to authoritative nameserver as we have restricted access to port 53 in firewall.
Similar setup worked with Google CloudDNS which is creating dns challege txt record with 60 sec TTL.
cert-manager v0.7.0
external-dns v0.5
Looks like --dns01-recursive-nameservers-only solved the issue here.
I did the installation as follows:
kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml
kubectl create namespace cert-manager
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation="true"
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
--name cert-manager \
--namespace cert-manager \
--version v0.7.2 \
--set "podDnsPolicy"="None" \
--set "podDnsConfig.nameservers[0]"="8.8.8.8" \
--set extraArgs='{--dns01-recursive-nameservers-only}' \
jetstack/cert-manager \
--tls
Nevertheless, I had to manually intervene over in Route53 to correct the TXT record cert-manager creates. It lacked the prefix _acme-challenge so I had to edit it in order for the Pod's validation to succeed.
Most helpful comment
Looks like
--dns01-recursive-nameservers-onlysolved the issue here.I did the installation as follows:
Nevertheless, I had to manually intervene over in Route53 to correct the TXT record cert-manager creates. It lacked the prefix
_acme-challengeso I had to edit it in order for the Pod's validation to succeed.