Go: x/build/devapp: dev.golang.org cert has expired

Created on 27 May 2019  路  21Comments  路  Source: golang/go

Issued On Tuesday, 26 February 2019 at 17:05:27
Expires On Monday, 27 May 2019 at 18:05:27

Probably needs a certbot renew run.

@dmitshur @bradfitz

Builders FrozenDueToAge NeedsFix

All 21 comments

We don't use certbot. We use Go's acme autocert package which should handle renewal automatically.

Ah I see. Well, it's something else then.

The certificate of https://dev-staging.golang.org expires on Tuesday, May 28, 2019 at 6:20:45 AM Eastern Daylight Time, which is just 1~ day away.

dev.golang.org is powered by golang.org/x/build/app/appengine, which is an App Engine app. I don't think it uses autocert itself directly.

Edit: It's actually powered by golang.org/x/build/devapp, which does use autocert directly. I mixed them up. :(

I've looked at devapp source code and I see it has some complex logic for redirecting HTTP to HTTPS. I saw some "acme/autocert: missing server name" errors in the logs. There's a chance the problem is in the code, perhaps related to HTTP handler redirection and the autocert package.

I thought simply redeploying would resolve the immediate problem of expired certificate, so I tried that on the staging instance first. The newly deployed version is currently in a crash-loop. Its logs say:

2019/05/27 21:40:18 Loading data from log *maintner.netMutSource ...
2019/05/27 21:40:18 Unknown error type x509.UnknownAuthorityError: x509.UnknownAuthorityError{Cert:(*x509.Certificate)(0xc0004e0580), hintErr:error(nil), hintCert:(*x509.Certificate)(nil)}
2019/05/27 21:40:18 Corpus GetMutations: Get https://maintner.golang.org/logs: x509: certificate signed by unknown authority
2019/05/27 21:40:18 Could not init corpus: godata.Get: Get https://maintner.golang.org/logs: x509: certificate signed by unknown authority

I got worried thinking https://maintner.golang.org/logs is also expired, but when I checked it, it seems to be using a healthy valid HTTPS cert that expires on July 19, 2019. So I can't explain that devapp error right now.

FWIW "missing server name" seems normal. It just means the client didn't provide a name in the SNI in ClientHelloInfo. Intentionally or otherwise.

I was able to deploy the staging instance of devapp using the parent commit of CL 176257, and it resolved the unknown authority error when fetching https://maintner.golang.org/logs. That CL removed the step that copied ca-certificates.crt into /etc/ssl/certs/, maybe that's related?

However, the certificate at https://dev-staging.golang.org hasn't been updated, so that still needs investigation and fixing.

Found it. It's fallout from my gitlock removal CLs. We're missing ca-certificates. That explains both errors.

Change https://golang.org/cl/179077 mentions this issue: devapp: add ca-certificates to Dockerfile

Nevermind. The CL I sent only explains the latter error and not the LetsEncrypt cert issue. I had the timeline wrong.

In the logs I see:

2019/05/27 22:36:52 http: TLS handshake error from 10.240.0.156:55541: acme/autocert: unable to authorize "dev.golang.org"; challenge "tls-alpn-01" failed with error: acme: authorization error for dev.golang.org: 403 urn:acme:error:unauthorized: Cannot negotiate ALPN protocol "acme-tls/1" for tls-alpn-01 challenge

The LetsEncrypt problem is because we have 2 replicas now.

/cc @andybons, who increased it from 1 to 2 in https://golang.org/cl/127036

That means renewals only have a 50% chance of working, as challenge info is kept in memory with x/crypto/acme/autocert.

As a temporary fix I've reduced it from 2 to 1.

There are several better fixes that will take more time.

Also, the readiness probe added in CL 127036 doesn't work, because we're redirecting it to https.

kubectl describe says:

Readiness probe failed: Get https://10.200.1.8:80/healthz: http: server gave HTTP response to HTTPS client

Actually I don't know what's up with that /healthz https error. The http-to-https redirect handler does appear to special case (allow) that path over http.

But I don't understand this output. It's like we're running some code that's not in the repo and I can't find....

bradfitz@go:~/go/src$ k logs devapp-deployment-798769dd55-rvqgp
2019/05/27 22:46:50 Updating dashboard data...
2019/05/27 22:46:50 Listening on [::]:80
2019/05/27 22:46:50 Serving TLS at [::]:443
2019/05/27 22:49:24 Fetched 494 open CLs in 15.951 seconds
2019/05/27 22:50:34 Fetched 4464 open issues in 70.638 seconds
2019/05/27 22:50:36 Fetched 99 closed issues in 1.499 seconds
2019/05/27 22:50:36 Cache "gzdata" update finished; writing 332116 bytes
2019/05/27 22:50:36 Sent 453 requests to GitHub

Sending requests to GitHub!? gzdata cache? What is all that?

I don't know what code logs that and I can't find it.

A "make deploy-prod" got the right code running again.

When I changed its config from 2 to 1 replicas and re-applied the deployment-prod.yaml file, it picked up the latest from image: gcr.io/symbolic-datum-552/devapp:latest... but we don't use latest anymore. (The make push-prod rule doesn't push the MUTABLE_VERSION (latest)). So that's why it was temporarily running ancient code.

That's another thing to fix here.

Well, it's running now and logs look happy, but I can't connect to it.

I have to run, though, so no time to debug further. Either @dmitshur can take over or we can wait until tomorrow.

GetCertificate doc says:

// If GetCertificate is used directly, instead of via Manager.TLSConfig, package users will
// also have to add acme.ALPNProto to NextProtos for tls-alpn-01, or use HTTPHandler
// for http-01. (The tls-sni-* challenges have been deprecated by popular ACME providers
// due to security issues in the ecosystem.)

It looks like we are using GetCertificate and not adding acme.ALPNProto:

https://github.com/golang/build/blob/78beebf19480669724269a7b3ee6c2eed3d2c64b/devapp/devapp.go#L137

If there's some HTTPS redirection bug preventing the "http-01" challenge from working via HTTPHandler, that could explain why it's not able to renew the LE cert. Edit: I noticed we're not using HTTPHandler anywhere, so I don't think "http-01" challenge was even enabled.

I can try adding acme.ALPNProto.

Good find. I bet that's it. We should audit all our services for that.

Looks like it was, I tried on https://dev-staging.golang.org/ and its cert has now been updated to expire on August 25, 2019 (previously, it was May 28, 2019).

I'll apply the same quick fix to dev.golang.org and leave this issue open for all the followup cleanup work here.

Change https://golang.org/cl/179097 mentions this issue: devapp: add acme.ALPNProto to NextProtos

https://dev.golang.org/ is working now. Thanks @bradfitz.


Although it didn't work right away. Specifically, I saw the following in logs:

http: TLS handshake error from 10.240.0.7:52609: refusing to serve autocert on provided domain

The "refusing to serve autocert on provided domain" error is from our HostPolicy function:

https://github.com/golang/build/blob/78beebf19480669724269a7b3ee6c2eed3d2c64b/devapp/devapp.go#L128-L130

Then later a bunch of:

http: TLS handshake error from 10.240.0.7:47971: acme/autocert: missing certificate
http: TLS handshake error from 10.240.0.48:57269: acme/autocert: missing certificate
http: TLS handshake error from 10.240.0.96:50177: acme/autocert: missing certificate
http: TLS handshake error from 10.240.0.48:57127: acme/autocert: missing certificate

Then a rate limit error from LE:

http: TLS handshake error from 10.240.0.48:52209: 429 urn:acme:error:rateLimited: Error creating new authz :: too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/

Then more EOFs:

http: TLS handshake error from 10.200.10.1:46531: EOF
http: TLS handshake error from 10.240.0.96:56001: EOF
http: TLS handshake error from 10.200.10.1:49963: EOF

And after a while it did renew the cert successfully. I don't know about the details of how Let's Encrypt works to explain this off the top of my head.

Old.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stub42 picture stub42  路  3Comments

mingrammer picture mingrammer  路  3Comments

ajstarks picture ajstarks  路  3Comments

natefinch picture natefinch  路  3Comments

michaelsafyan picture michaelsafyan  路  3Comments