Lego: ACME v2: use of prod `kid` with staging env

Created on 6 Jun 2018  Β·  8Comments  Β·  Source: go-acme/lego

:wave: Hi folks,

You may have seen my announcement of stricter JWS KID header processing for Let's Encrypt's ACME v2 servers. This change is active in staging and coming soon (tomorrow!) to the production env.

I wanted to reach out because Go-http-client/1.1 (linux; amd64) xenolf-acme is one of the user-agents I'm seeing receive the "KeyID header contained an invalid account URL" malformed response in the staging logs.

In the past 24 hours there were ~6 distinct client IPs that produced ~16 malformed JWS kid errors. Certainly not a flood of errors but the staging environment receives less traffic overall and the Lego V2 support is quite new.

In all of the cases observed in the past 24hr period the client was sending a "kid" header value with a prefix "https://acme-v02.api.letsencrypt.org/acme/acct/". This is the production account URL prefix and so I suspect there may be a common misconfiguration or a bug in Lego that results in reuse of a production account with the staging environment. Another potential helpful tidbit: All of the requests were for the newOrder endpoint.

Do you folks have any insights as to root cause here? For what its worth you are not alone and I'm trying to triage a similar issue in another ACME client: https://github.com/jetstack/cert-manager/issues/601

Let me know if I can provide any other helpful information. Thanks!

Most helpful comment

Thanks for spit-balling it with me!

Thanks for checking in :)
I don't want to completely rule out a bug in the CLI of course, but I can't think of a way of that happening without edits to the code or the directory structure.
If people come here that are experiencing this in the CLI, please open a new issue and provide as much information as you can so we can look into it.

All 8 comments

Hey @cpu!

Lego sets the kid at two points in the code.

  • Once when a new account was registered here where the URL is taken directly from the response.
  • The second place is when a client is getting instantiated when a user is already registered here. This takes the saved URL for the account being used.

As far as the lego CLI is concerned, we save the registration resource with its URL after registration and that is what's used in the second case above.

To me this seems to be a configuration error where the staging endpoint is being called with an account that was created on the production endpoint for example.

To me this seems to be a configuration error where the staging endpoint is being called with an account that was created on the production endpoint for example.

That sounds like a likely explanation. Maybe there's a usability argument to be made that NewClient should error if the FQDN of reg.URI doesn't match the FQDN of caDirURL?

That would work fine with Let's Encrypt but I suppose its _possible_ another ACME server could have a directory URL like "cpus.house.of.certificates.pki/directory" but return accounts with URLs in a different namespace "cpus.house.of.acme.accounts.pki/acccount/1244". There's nothing in the spec that would forbid that.

Another more robust idea: The persisted account objects could store the directory they were registered with and NewClient could error if reg.directoryURL != caDirURL?

There is already a separation in place per endpoint in the CLI.
The directory layout the CLI will generate looks like this:

β”œβ”€β”€β”€.lego
β”‚   └───accounts
β”‚       └───acme-staging-v02.api.letsencrypt.org
β”‚           └───[email protected]
β”‚               └───keys

So if the user was in fact using the CLI, I can only assume that they edited their folders. On the other hand if it was a user of the library, there are no restrictions in place.

So if the user was in fact using the CLI

There appear to be multiple users running into this so whatever the root cause it seems to be tripping up a few folks.

On the other hand if it was a user of the library, there are no restrictions in place.

Do you have any theories on how a library user could end up providing a User to NewClient that returns registration resources from the wrong environment?

Looking quickly it seems like the GetRegistration function of the User interface doesn't provide any way for the implementation to know the directory that is being used with the returned RegistrationResource. Should GetRegistration perhaps accept a directoryURL string argument so the implementation could return the correct registration for the directory in question?

I'm admittedly unfamiliar with library usage of Lego. If you think this is strictly mis-configuration I'm happy to close this issue and folks that run into the error can chase down the root cause themselves :-)

Looking quickly it seems like the GetRegistration function of the User interface doesn't provide any way for the implementation to know the directory that is being used with the returned RegistrationResource. Should GetRegistration perhaps accept a directoryURL string argument so the implementation could return the correct registration for the directory in question?

I suppose that might be silly since both the directory URL and the User implementation are provided by the caller to NewClient. The caller should be providing the right User for the right directory in a perfect world.

Do you have any theories on how someone would end up providing a User to NewClient that returns registration resources from the wrong environment?

Yeah I could think of a few actually. Most prominently while updating their current code to work with V2 while using their development environment (with old users maybe) to test against the staging endpoint.

Looking quickly it seems like the GetRegistration function of the User interface doesn't provide any way for the implementation to know the directory that is being used with the returned RegistrationResource.

That is true. In context they have to know though as they instantiate the client with the directory URL and should determine the user beforehand - lego only operates on a single user per client instance. How someone using the library is managing their users is up to them, lego does not impose anything on them other than the User interface.

@xenolf Makes sense. You've convinced me there probably isn't a good way to add more guard rails for library users in this case. Mistakes can and will happen but the error message from the ACME server is probably sufficient to have a sense of how to fix the problem. I'll close this issue - it doesn't appear to be a Lego bug and I think there's enough breadcrumbs in our conversation for affected users to find their way.

Thanks for spit-balling it with me!

Thanks for spit-balling it with me!

Thanks for checking in :)
I don't want to completely rule out a bug in the CLI of course, but I can't think of a way of that happening without edits to the code or the directory structure.
If people come here that are experiencing this in the CLI, please open a new issue and provide as much information as you can so we can look into it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bouwerp picture bouwerp  Β·  3Comments

benjamincudi picture benjamincudi  Β·  3Comments

AubreyHewes picture AubreyHewes  Β·  3Comments

kop picture kop  Β·  5Comments

rawtaz picture rawtaz  Β·  3Comments