:wave: Hi folks,
You may have seen my announcement of stricter JWS KID header processing for Let's Encrypt's ACME v2 servers. This change is active in staging and coming soon (tomorrow!) to the production env.
I wanted to reach out because Go-http-client/1.1 (linux; amd64) xenolf-acme
is one of the user-agents I'm seeing receive the "KeyID header contained an invalid account URL" malformed response in the staging logs.
In the past 24 hours there were ~6 distinct client IPs that produced ~16 malformed JWS kid
errors. Certainly not a flood of errors but the staging environment receives less traffic overall and the Lego V2 support is quite new.
In all of the cases observed in the past 24hr period the client was sending a "kid"
header value with a prefix "https://acme-v02.api.letsencrypt.org/acme/acct/"
. This is the production account URL prefix and so I suspect there may be a common misconfiguration or a bug in Lego that results in reuse of a production account with the staging environment. Another potential helpful tidbit: All of the requests were for the newOrder
endpoint.
Do you folks have any insights as to root cause here? For what its worth you are not alone and I'm trying to triage a similar issue in another ACME client: https://github.com/jetstack/cert-manager/issues/601
Let me know if I can provide any other helpful information. Thanks!
Hey @cpu!
Lego sets the kid
at two points in the code.
As far as the lego CLI is concerned, we save the registration resource with its URL after registration and that is what's used in the second case above.
To me this seems to be a configuration error where the staging endpoint is being called with an account that was created on the production endpoint for example.
To me this seems to be a configuration error where the staging endpoint is being called with an account that was created on the production endpoint for example.
That sounds like a likely explanation. Maybe there's a usability argument to be made that NewClient
should error if the FQDN of reg.URI
doesn't match the FQDN of caDirURL
?
That would work fine with Let's Encrypt but I suppose its _possible_ another ACME server could have a directory URL like "cpus.house.of.certificates.pki/directory"
but return accounts with URLs in a different namespace "cpus.house.of.acme.accounts.pki/acccount/1244"
. There's nothing in the spec that would forbid that.
Another more robust idea: The persisted account objects could store the directory they were registered with and NewClient
could error if reg.directoryURL
!= caDirURL
?
There is already a separation in place per endpoint in the CLI.
The directory layout the CLI will generate looks like this:
ββββ.lego
β ββββaccounts
β ββββacme-staging-v02.api.letsencrypt.org
β ββββ[email protected]
β ββββkeys
So if the user was in fact using the CLI, I can only assume that they edited their folders. On the other hand if it was a user of the library, there are no restrictions in place.
So if the user was in fact using the CLI
There appear to be multiple users running into this so whatever the root cause it seems to be tripping up a few folks.
On the other hand if it was a user of the library, there are no restrictions in place.
Do you have any theories on how a library user could end up providing a User
to NewClient
that returns registration resources from the wrong environment?
Looking quickly it seems like the GetRegistration
function of the User
interface doesn't provide any way for the implementation to know the directory that is being used with the returned RegistrationResource
. Should GetRegistration
perhaps accept a directoryURL string
argument so the implementation could return the correct registration for the directory in question?
I'm admittedly unfamiliar with library usage of Lego. If you think this is strictly mis-configuration I'm happy to close this issue and folks that run into the error can chase down the root cause themselves :-)
Looking quickly it seems like the GetRegistration function of the User interface doesn't provide any way for the implementation to know the directory that is being used with the returned RegistrationResource. Should GetRegistration perhaps accept a directoryURL string argument so the implementation could return the correct registration for the directory in question?
I suppose that might be silly since both the directory URL and the User
implementation are provided by the caller to NewClient
. The caller should be providing the right User
for the right directory in a perfect world.
Do you have any theories on how someone would end up providing a User to NewClient that returns registration resources from the wrong environment?
Yeah I could think of a few actually. Most prominently while updating their current code to work with V2 while using their development environment (with old users maybe) to test against the staging endpoint.
Looking quickly it seems like the GetRegistration function of the User interface doesn't provide any way for the implementation to know the directory that is being used with the returned RegistrationResource.
That is true. In context they have to know though as they instantiate the client with the directory URL and should determine the user beforehand - lego only operates on a single user per client instance. How someone using the library is managing their users is up to them, lego does not impose anything on them other than the User
interface.
@xenolf Makes sense. You've convinced me there probably isn't a good way to add more guard rails for library users in this case. Mistakes can and will happen but the error message from the ACME server is probably sufficient to have a sense of how to fix the problem. I'll close this issue - it doesn't appear to be a Lego bug and I think there's enough breadcrumbs in our conversation for affected users to find their way.
Thanks for spit-balling it with me!
Thanks for spit-balling it with me!
Thanks for checking in :)
I don't want to completely rule out a bug in the CLI of course, but I can't think of a way of that happening without edits to the code or the directory structure.
If people come here that are experiencing this in the CLI, please open a new issue and provide as much information as you can so we can look into it.
Most helpful comment
Thanks for checking in :)
I don't want to completely rule out a bug in the CLI of course, but I can't think of a way of that happening without edits to the code or the directory structure.
If people come here that are experiencing this in the CLI, please open a new issue and provide as much information as you can so we can look into it.