Nixpkgs: ACME fails with JWS verification error

Created on 23 Oct 2020  路  15Comments  路  Source: NixOS/nixpkgs

Describe the bug

When upgrading from 20.03 to 20.09, acme started failing for 5 of my domains (but not all of them).

To Reproduce

  1. systemctl start acme-foo.example.org.service
acme-foo.example.com-start[23185]: 2020/10/23 08:32:54 [INFO] [foo.example.com] acme: Obtaining bundled SAN certificate
acme-foo.example.com-start[23185]: 2020/10/23 08:32:54 Could not obtain certificates:
acme-foo.example.com-start[23185]:         acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:malformed :: JWS verification error, url:
systemd[1]: acme-foo.example.com.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: acme-foo.example.com.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Renew ACME certificate for foo.example.com.
systemd[1]: acme-foo.example.com.service: Consumed 371ms CPU time, received 5.8K IP traffic, sent 2.2K IP traffic.

Expected behavior
No failure

Additional context
Add any other context about the problem here.

Notify maintainers
cc @NixOS/acme

Metadata

  • system: "x86_64-linux"
  • host os: Linux 5.8.14, NixOS, 20.09beta1083.51aaa3fa1b6 (Nightingale)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.3.7
  • channels(root): "nixos-20.09beta1346.05334ad7852, nixos-unstable-21.03pre246543.24c9b05ac53"
  • channels(symphorien): "home-manager-20.09"
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module: acme
bug regression nixos

All 15 comments

I hit the same bug as well and fixed it on my machines with the following workaround: https://github.com/NixOS/nixpkgs/pull/91121#issuecomment-692180005

Thanks for reposting that @Ma27 . Think I will extend #100356 with some info on how to deal with this error.

I wiped /var/lib/acme/.lego/accounts and now all certificate services fail:

acme-foo.example.com-start[1329]: 2020/10/23 19:17:28 No key found for account [email protected]. Generating a 2048 key.
acme-foo.example.com-start[1329]: 2020/10/23 19:17:29 Saved key to accounts/acme-v02.api.letsencrypt.org/[email protected]/keys/[email protected]
acme-foo.example.com-start[1329]: 2020/10/23 19:17:32 Account [email protected] is not registered. Use 'run' to register a new account.
systemd[1]: acme-foo.example.com.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: acme-foo.example.com.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Renew ACME certificate for foo.example.com.

Right - the tests for whether to run lego renew or lego run don't check the accounts directory, I will fix that now. That said, I don't know if ACME fairs well if you delete your account details and keep your certificates. I'm pretty sure you need to use the same account to issue a renew request for existing certificates.

Do you mean that I should nuke all of /var/lib/acme ?

Pretty much, yeah. Personally I would move it somewhere else instead of flat-out delete ;) Then run systemd-tmpfiles --create.

It works.

Awesome. I have opened a PR there which will resolve this issue. In particular, I made it check if the accounts directory is empty, which hopefully negates the need to wipe out your entire /var/lib/acme directory. I also added some documentation on the process.

I am also running into this problem. The workaround works, but the account key was generated just yesterday and nothing looks corrupted (JSON and EC key). Today, I was trying to enroll a new domain using the same account. There may be something more fishy around this. Unsure how to debug this.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/release-notes-20-03-20-09-instructions-may-need-an-update-acme/9787/2

102387 prevents renewal services to be run at the same time. I'm unsure as to how to test if it fixes the issue.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/lets-encrypt-on-20-09/9950/2

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/lets-encrypt-on-20-09/9950/3

102387 prevents renewal services to be run at the same time. I'm unsure as to how to test if it fixes the issue.

@symphorien FWIW I don't think that renewing more than one cert at a time is a factor in this bug. I only had a single cert fail when upgrading to NixOS 20.09 and running the single systemd service by itself still failed. It appears this will be fixed by #102862

That said, linking all of the renewal services together would be nice so that if one fails the others won't be tried. If you have 5 or more failures you basically lock yourself out of LE for an hour.

That said, linking all of the renewal services together would be nice so that if one fails the others won't be tried. If you have 5 or more failures you basically lock yourself out of LE for an hour.

This PR does not do that. If a certificate fails, the next ones will be tried as well.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ob7 picture ob7  路  3Comments

sid-kap picture sid-kap  路  3Comments

langston-barrett picture langston-barrett  路  3Comments

matthiasbeyer picture matthiasbeyer  路  3Comments

yawnt picture yawnt  路  3Comments