/kind bug
cert-manager-v0.3.1
*Ingress isn't modified and thus he challenge fails when its routed to the proxy behind the ingress *:
The ingress to be modified, thus causing the challenge to be intercepted:
How to reproduce it (as minimally and precisely as possible):
My stack is ingress->service->nginx-proxy
Ingress as follows, which exists and is bound to static global ip - the DNS entry resolves.
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: kibana
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.global-static-ip-name: ${STATIC_IP_NAME}
certmanager.k8s.io/acme-http01-edit-in-place: "true"
certmanager.k8s.io/cluster-issuer: "letsencrypt-issuer"
certmanager.k8s.io/acme-challenge-type: "http01"
labels:
app: kibana
spec:
tls:
- hosts:
- ${DOMAIN_NAME}
secretName: demo-elastic-co
backend:
serviceName: nginx-service
servicePort: 80
rules:
- host: ${DOMAIN_NAME}
Service:
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: default
spec:
type: NodePort
ports:
- port: 80
name: kibana
selector:
app: nginx
The site is available over http and resolves.
Issuer
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-issuer
namespace: default
spec:
acme:
# The ACME server URL
server: "https://acme-staging-v02.api.letsencrypt.org/directory"
# Email address used for ACME registration
email: "${EMAIL}"
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-key
# Enable the HTTP-01 challenge provider
http01: {}
Cert
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: demo-elastic-co
namespace: default
spec:
secretName: demo-elastic-co
issuerRef:
name: letsencrypt-issuer
commonName: ${DOMAIN_NAME}
dnsNames:
- ${DOMAIN_NAME}
acme:
config:
- http01:
ingress: kibana
domains:
- ${DOMAIN_NAME}
Creating the issuer and then cert, logs redacted:
0601 17:01:14.034442 1 controller.go:177] certificates controller: syncing item 'default/test-domain-co'
I0601 17:01:14.034567 1 sync.go:239] Preparing certificate default/test-domain-co with issuer
I0601 17:01:14.034582 1 acme.go:159] getting private key (letsencrypt-key->tls.key) for acme issuer default/letsencrypt-issuer
I0601 17:01:14.034925 1 logger.go:27] Calling GetOrder
I0601 17:01:14.187634 1 logger.go:52] Calling GetAuthorization
I0601 17:01:14.259376 1 logger.go:72] Calling HTTP01ChallengeResponse
I0601 17:01:14.259425 1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/test-domain-co
I0601 17:01:14.259447 1 logger.go:47] Calling GetChallenge
I0601 17:01:14.416321 1 helpers.go:162] Found status change for Certificate "test-domain-co" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-01 17:01:14.416307907 +0000 UTC m=+7713.231874001
I0601 17:01:14.416360 1 sync.go:241] Error preparing issuer for certificate default/test-domain-co: http-01 self check failed for domain "test-domain.example.co"
E0601 17:01:14.423710 1 sync.go:168] [default/test-domain-co] Error getting certificate 'test-domain-co': secret "test-domain-co" not found
E0601 17:01:14.423762 1 controller.go:186] certificates controller: Re-queuing item "default/test-domain-co" due to error processing: http-01 self check failed for domain "test-domain.example.co"
I0601 17:02:14.424648 1 controller.go:177] certificates controller: syncing item 'default/test-domain-co'
I0601 17:02:14.425671 1 sync.go:239] Preparing certificate default/test-domain-co with issuer
I0601 17:02:14.425781 1 acme.go:159] getting private key (letsencrypt-key->tls.key) for acme issuer default/letsencrypt-issuer
I0601 17:02:14.427512 1 logger.go:27] Calling GetOrder
I0601 17:02:14.610652 1 logger.go:52] Calling GetAuthorization
I0601 17:02:14.713524 1 logger.go:72] Calling HTTP01ChallengeResponse
I0601 17:02:14.713564 1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/test-domain-co
I0601 17:02:14.713582 1 logger.go:47] Calling GetChallenge
I0601 17:02:14.817513 1 helpers.go:162] Found status change for Certificate "test-domain-co" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-01 17:02:14.817499697 +0000 UTC m=+7773.633065800
I0601 17:02:14.817550 1 sync.go:241] Error preparing issuer for certificate default/test-domain-co: http-01 self check failed for domain "test-domain.example.co"
E0601 17:02:14.823613 1 sync.go:168] [default/test-domain-co] Error getting certificate 'test-domain-co': secret "test-domain-co" not found
E0601 17:02:14.823647 1 controller.go:186] certificates controller: Re-queuing item "default/test-domain-co" due to error processing: http-01 self check failed for domain "test-domain.example.co"
In he nginx logs for the app i see the challenge - tweaking the proxy to 404,200 or 301 makes no difference.
35.202.242.207, 35.201.81.158 - - [01/Jun/2018:17:01:14 +0000] "GET /.well-known/acme-challenge/5f_2k1u87-xJ1h4xMjNZN7q9nPlVVSfHVwKH9M58UCw HTTP/1.1" 301 610 "" "Go-http-client/1.1"
35.202.242.207, 35.201.81.158 - - [01/Jun/2018:17:02:14 +0000] "GET /.well-known/acme-challenge/5f_2k1u87-xJ1h4xMjNZN7q9nPlVVSfHVwKH9M58UCw HTTP/1.1" 301 610 "" "Go-http-client/1.1"
Anything else we need to know?:
GCE:
kubectl version):Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-05-12T04:12:12Z", GoVersion:"go1.9.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.2-gke.3", GitCommit:"d2c7a2bd41036f9474287579a725dc54c904e92d", GitTreeState:"clean", BuildDate:"2018-05-23T00:19:39Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
I should add i see no modifications append to the ingress at any point
Adding that i have he following role for tiller
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller-default
namespace: default
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: tiller-cluster-admin-binding
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
and installed cert-manager with
helm install --name cert-manager --namespace default stable/cert-manager
Also, despite the annotations on the ingress cert-manager seems to spawn a http resolver - another pod i assume to intercept the calls rather than modifying the ingress
cm-acme-http-solver-jn2xh 1/1 Running 0 11s
If i install via
helm install --name cert-manager --namespace default stable/cert-manager --set ingressShim.defaultIssuerName=letsencrypt-issuer --set ingressShim.defaultIssuerKind=ClusterIssuer
and deploy just the issuer, i get
I0601 19:35:07.909076 1 controller.go:177] certificates controller: syncing item 'default/demo-elastic-co'
I0601 19:35:07.909183 1 sync.go:239] Preparing certificate default/demo-elastic-co with issuer
I0601 19:35:07.909199 1 acme.go:159] getting private key (letsencrypt-key->tls.key) for acme issuer default/letsencrypt-issuer
I0601 19:35:07.909600 1 logger.go:27] Calling GetOrder
I0601 19:35:08.076256 1 logger.go:52] Calling GetAuthorization
I0601 19:35:08.126714 1 logger.go:72] Calling HTTP01ChallengeResponse
I0601 19:35:08.126773 1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/demo-elastic-co
I0601 19:35:08.126797 1 logger.go:47] Calling GetChallenge
I0601 19:35:08.195021 1 helpers.go:162] Found status change for Certificate "demo-elastic-co" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-01 19:35:08.195006168 +0000 UTC m=+109.606525986
I0601 19:35:08.195060 1 sync.go:241] Error preparing issuer for certificate default/demo-elastic-co: http-01 self check failed for domain "demo.elastic.co"
I0601 19:35:08.199715 1 controller.go:152] ingress-shim controller: syncing item 'default/kibana'
I0601 19:35:08.199753 1 sync.go:123] Certificate "demo-elastic-co" for ingress "kibana" already exists
I0601 19:35:08.199765 1 sync.go:126] Certificate "demo-elastic-co" for ingress "kibana" is up to date
I0601 19:35:08.199788 1 controller.go:166] ingress-shim controller: Finished processing work item "default/kibana"
E0601 19:35:08.200398 1 sync.go:168] [default/demo-elastic-co] Error getting certificate 'demo-elastic-co': secret "demo-elastic-co" not found
E0601 19:35:08.200431 1 controller.go:186] certificates controller: Re-queuing item "default/demo-elastic-co" due to error processing: http-01 self check failed for domain "demo.elastic.co"
I suspect it's caused by a recent change in k8s:
https://github.com/kubernetes/ingress-gce/pull/112/commits/d2559d25b09e24d00fda94acb1955c0c440c813e?utf8=%E2%9C%93&diff=split#diff-3c862eb54a8e0e161b534e0c67e5379eR414
Previously loadbalancer-controller logs the non-exist secret and then proceeds to create the /.well-known/acme-challenge/... path rule in gclb. Once this rule is created, the self check will pass then cert-manager will obtain the new cert and create the secret.
But now the loadbalancer-controller just gets stuck waiting the secret to be created without creating the path rules in gclb.
The walkaround is just to manually create the secret first (must be in valid format, we just copied the secret from our staging cluster). After that cert-manager will do the normal acme workflow and update the k8s secret and gclb.
Also running into this :(
@chengji77 thanks very much for digging that out - I had not seen this change 😬
I have made a comment here: https://github.com/kubernetes/ingress-gce/pull/112#issuecomment-394784727
It's a real shame to see yet another divergence in how ingress controllers behave 😢 this will indeed break both kube-lego and cert-manager when used with GCLB, unless some form of certificate already exists (this must be expired or nearing expiry in order for cert-manager to trigger renewal)
I followed the workaround mentioned above and while my ingress is able to pull the manually created (and bogus) secret, I am still getting failed self checks immediately following the pull of the secret.
It's also worth noting that the ACME server will refuse to validate domain ownership by HTTPS (regardless whether it is valid or invalid). You must make sure the challenge endpoint is accessible over http on port 80 (you can test this using curl -vv http://challenge-endpoint......)
So what I am hearing is that the ingress must come completely online before the self check will stop failing (and the secret gets re-generated)?
Yes, it will take 10 minutes or so for any changes to your load balancers to be performed anyway.
You will need to supply a certificate that is nearing expiry (within 30d of expiry) the first time you issue a certificate, from what I understand of the problem. It doesn't matter which CA signed the cert, and it can be self signed too.
This will then:
a) cause ingress-gce to serve with the provided, nearing expiry cert.
b) so long as you have not disabled HTTP traffic on your ingress, also cause ingress-gce to serve over port 80
c) cert-manager will see a certificate referencing that secret, and determine it needs to be renewed as it is nearing expiry, and trigger HTTP01 validation
d) cert-manager will edit the ingress resource to include the challenge path
e) ingress-gce will update the LB accordingly
f) after ~10m, the change will be reflected in the GCLB and the self check should pass (as well as the LE validation attempt)
Must the ingress controller be completely online before this will pass? I completely nuked the deployment and reapplied (cert, issuer, ingress, etc.. everything except the manually created cert noted as a workaround ) and I am still getting a failed selfcheck
@thebigredgeek you need to be very patient when using GCLB's - they are extremely slow to update.
Ok, so if the cert is set to expire in 365 this won't work. So it sounds like i need to create a closer-to-expiry cert
Ok, so if the cert is set to expire in 365 this won't work. So it sounds like i need to create a closer-to-expiry cert
Correct - FWIW, this is not how we expect users to use cert-manager with GCLB ingresses, and is a regression caused by https://github.com/kubernetes/ingress-gce/pull/112 that we need to fix 😄
Yeah I saw that. No worries, and thanks for your help so far!
If i create a cert that is already expired, that should work too yeah?
@thebigredgeek What I did was to use a cert from our staging environment, which doesn't contain the domain for prod. This also triggered cert-manager to renew the cert.
@munnerz seems to still be happening 30 minutes later :(. It shouldn't take this long to self correct, should it?
@thebigredgeek nope - are you on Kubernetes slack? Can you send over your full cert-manager logs, as well as the output of kubectl describe issuer,clusterissuer,certificate --all-namespaces?
We can then update this issue if we come to a resolution that's relevant 😄
Sure, I’ll hop on tomorrow (I’m US pacific)
Just slacked you on k8s
Any traction here? Still trying to figure out how to make this work
same problem here, we are using the manually created tls secret for now.
same problem ! working with 1.9.7 GKE cluster
are there any updates here ?
got the same problem! we're using GKE 1.10.2-gke.3
I had this same issue, or it at least behaved the same - secret was not being created.
What ended up working for me was the same as mentioned above. I generated a self-signed certificate expiring in one day and manually created the secret.
openssl req -x509 -nodes -days 1 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=subdomain.example.com"
kubectl create secret tls my-secret --key /tmp/tls.key --cert /tmp/tls.crt
After this, cert-manager successfully issued a new certificate.
We're having the same issue on GKE cluster at 1.10.4-gke.2 the generation of the certificate above worked although it's far from ideal for obvious reasons.
having these problems also
I have opened https://github.com/kubernetes/ingress-gce/pull/388 which will fix this issue.
As another alternative - for now, ingress-gce users can manually specify a Certificate resource. You will need to exclude the TLS section from your Certificate whilst this is provisioning, but once done, you should be good to add it back in referencing the newly created Secret.
@munnerz
I just encountered this issue using the latest stable/ingress-nginx and stable/cert-manager charts as I
write this.
I had to precreate the cert just like for GCE above to make it work with ingress-nginx, other nginx spent all day whining the secret didn't exist which prevented cert-manager from doing its job.
Did ingress-nginx copy GCE?
W0714 01:16:02.171068 5 controller.go:1020] ssl certificate "dev/web-api-tls-secret" does not exist in local store
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:16:09 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 404 62 "-" "Go-http-client/1.1" 176 0.001 [dev-web-api-service-80] 10.1.0.26:80 31 0.000 404 c0fe1ea8344e36c93c89e6a23cdf9f30
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:16:13 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 404 62 "-" "Go-http-client/1.1" 176 0.001 [dev-web-api-service-80] 10.1.0.25:80 31 0.000 404 f9b367bdfa4df7aa4a2f5f6876f924df
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:16:30 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 404 62 "-" "Go-http-client/1.1" 176 0.001 [dev-web-api-service-80] 10.1.0.7:80 31 0.004 404 51223ce2ad35371671d30a247602259
The moment I created the secret
0714 01:17:56.903217 5 store.go:348] secret dev/web-api-tls-secret was added and it is used in ingress annotations. Parsing...
I0714 01:17:56.904870 5 backend_ssl.go:69] adding secret dev/web-api-tls-secret to the local store
I0714 01:17:57.226319 5 controller.go:177] ingress backend successfully reloaded...
I0714 01:17:58.864434 5 backend_ssl.go:181] updating local copy of ssl certificate dev/web-api-tls-secret with missing intermediate CA certs
I0714 01:18:00.192565 5 controller.go:168] backend reload required
I0714 01:18:00.302296 5 controller.go:177] ingress backend successfully reloaded...
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:18:14 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 200 87 "-" "Go-http-client/1.1" 176 0.026 [dev-cm-acme-http-solver-h49ts-8089] 10.1.0.15:8089 87 0.024 200 489ee18372ea0fe577641c8ace44565b
ingress-nginx was however generating a self-signed cert, it just seems to completely dropped routing the acme challenge while it was busy throwing a fit over the missing secret
I tried creating the secret manually like mentioned above but it still doesn't work for me.
All I'm getting is:
I0824 15:17:47.531456 1 controller.go:181] certificates controller: syncing item 'default/domain-production-tls-ipv4'
I0824 15:17:47.531976 1 sync.go:242] Preparing certificate default/domain-production-tls-ipv4 with issuer
I0824 15:17:47.532002 1 acme.go:169] getting private key (letsencrypt-prod->tls.key) for acme issuer kube-system/letsencrypt-prod
I0824 15:17:47.532607 1 logger.go:27] Calling GetOrder
I0824 15:17:47.777548 1 logger.go:57] Calling GetAuthorization
I0824 15:17:47.953649 1 logger.go:77] Calling HTTP01ChallengeResponse
I0824 15:17:47.953793 1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/domain-production-tls-ipv4
I0824 15:17:47.953839 1 logger.go:52] Calling GetChallenge
I0824 15:17:48.148411 1 helpers.go:188] Found status change for Certificate "domain-production-tls-ipv4" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-24 15:17:48.148395361 +0000 UTC m=+1936.846873368
I0824 15:17:48.148908 1 sync.go:244] Error preparing issuer for certificate default/domain-production-tls-ipv4: http-01 self check failed for domain "domain.com"
I0824 15:17:48.149264 1 sync.go:174] Certificate default/domain-production-tls-ipv4 scheduled for renewal in -696 hours
E0824 15:17:48.156534 1 controller.go:190] certificates controller: Re-queuing item "default/domain-production-tls-ipv4" due to error processing: http-01 self check failed for domain "domain.com"
EDIT:
After letting it run for a few hours I received the certificate. Phew, thank god. I almost gave up.
This is working now on 1.10.7-gke.2, You'll see a warming now in your service:
Could not find TLS certificates. Continuing setup for the load balancer to serve HTTP. Note: this behavior is deprecated and will be removed in a future version of ingress-gce
Awesome, thanks for confirming it has rolled out!
I'm going to close this issue now then as the issue is resolved.
We'll soon be in a better position to workaround this limitation in future
from our end too, to avoid the deprecated behaviour warning.
/close
On Tue, 25 Sep 2018 at 16:56, Brett Curtis notifications@github.com wrote:
This is working now on 1.10.7-gke.2, You'll see a warming now in your
service:Could not find TLS certificates. Continuing setup for the load balancer to
serve HTTP. Note: this behavior is deprecated and will be removed in a
future version of ingress-gce—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jetstack/cert-manager/issues/606#issuecomment-424397233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMbPzFWfxhUNegJqAHp-z6ns5aHHo13ks5uelIwgaJpZM4UXCIT
.
@munnerz: Closing this issue.
In response to this:
Awesome, thanks for confirming it has rolled out!
I'm going to close this issue now then as the issue is resolved.
We'll soon be in a better position to workaround this limitation in future
from our end too, to avoid the deprecated behaviour warning./close
On Tue, 25 Sep 2018 at 16:56, Brett Curtis notifications@github.com wrote:
This is working now on 1.10.7-gke.2, You'll see a warming now in your
service:Could not find TLS certificates. Continuing setup for the load balancer to
serve HTTP. Note: this behavior is deprecated and will be removed in a
future version of ingress-gce—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jetstack/cert-manager/issues/606#issuecomment-424397233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMbPzFWfxhUNegJqAHp-z6ns5aHHo13ks5uelIwgaJpZM4UXCIT
.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I am seeing the warning as well, is there any open issue for this that I can subscribe for updates on it? :smile:
Most helpful comment
Also running into this :(