Nixpkgs: nixos/acme: 20.03 -> 20.09 regression with opensmtpd

Created on 22 Oct 2020  路  7Comments  路  Source: NixOS/nixpkgs

Describe the bug
When upgrading from 20.03 to 20.09, opensmtpd stopped working (apparently) due to ownership changes in #91121:

Oct 22 14:40:01 hostname systemd[1]: Started opensmtpd.service.
Oct 22 14:40:03 hostname smtpd[758]: warn:  /var/lib/acme/mx.example.com/fullchain.pem: not owned by uid 0
Oct 22 14:40:03 hostname smtpd[758]: smtpd: load_pki_tree: failed to load certificate file
Oct 22 14:40:03 hostname systemd[1]: opensmtpd.service: Main process exited, code=exited, status=1/FAILURE
Oct 22 14:40:03 hostname systemd[1]: opensmtpd.service: Failed with result 'exit-code'.

As you can see, opensmtpd seems to require the certificates to be owned by root.
I tried adding chown root fullchain.pem key.pem to security.acme.certs.postRun but it seems that the ownership resets to acme when rebooting.

To Reproduce

  1. Configure acme to generate a certificate
  2. Configure opensmtpd to use the certificate

Expected behavior
opensmtpd should start without errors.

Notify maintainers
@m1cr0man @lheckemann @mweinelt @flokli

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 5.4.72, NixOS, 20.09.git.4d3d4221cad (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.7`
bug

All 7 comments

Merely a workaround but you could probably copy them to another directory after each run. This isn't pretty but would at least unblock you - if this is a blocker at all.

In the long run we might want to consider supporting this and maybe OpenSMTPD can be taugtht to not enforce the uid 0 rule. Removing the code that does that specific check seems rather trivial: https://github.com/OpenSMTPD/OpenSMTPD/blob/a6a39cb0cb6ee72d194f84d6763997df7e0c0bf3/usr.sbin/smtpd/ssl.c#L117-L121. I wonder what they threat model looked like. Is their assumption having files as another uid is less/more safe?

Merely a workaround but you could probably copy them to another directory after each run. This isn't pretty but would at least unblock you - if this is a blocker at all.

Thanks, I will try this!

I wonder what they threat model looked like. Is their assumption having files as another uid is less/more safe?

Presumably, having a file as another uid is less safe, otherwise it wouldn't make sense to enforce that rule. One reason might be that if a file has 600 or 400 permissions and is owned by root, then it can only be read by root. If it is owned by another uid, then it can be read by root and that uid. But I don't know if this is the actual reason for the rule.

Presumably, having a file as another uid is less safe, otherwise it wouldn't make sense to enforce that rule. One reason might be that if a file has 600 or 400 permissions and is owned by root, then it can only be read by root. If it is owned by another uid, then it can be read by root _and_ that uid. But I don't know if this is the actual reason for the rule.

The reason for enforcing the acme user is twofold:

  1. We didn't want the ACME services to run as root as there was no technical reason to do so and it posed a security risk (you have a program contacting some HTTP service and writing files to disk, it should be minimally privileged)
  2. In order to ensure that all certs could be created/updated by the ACME services it was necessary to enforce the acme user everywhere.

Copying them to another directory would be the best option here. The real thing you're fighting with regards to adding chown to postRun is the acme-fixperms.service, but even if that worked the 750 permissions would break the renew service entirely (as the acme user wouldn't be able to write new certs). I think this is a particularly unique issue, given that they are enforcing UID 0 for certs (to what end?), and having thought it through, allowing the user to be changed would be more cumbersome than simply copying the certs in the postRun script.


It might be possible to do some adaptation in the service scripts so that all the chown commands use $USER instead of acme. That would mean you could then override all the systemd service users by hand (via systemd.services.<name>.serviceConfig.User) and get what you need. However I don't see this solving any issues that a copy command in postRun doesn't solve, and is also more complex to use.

I'd also propose copying them to another directory for now, outside of the acme module.

Once systemd 247 has landed, this should probably be revisited. It adds LoadCredential, which can be used to pass around credentials to processes without them having access to the bare files themselves.

Then, files will be made readable by the process that runs the service:

The data is accessible from the unit's processes via the file system, at a read-only
location that (if possible and permitted) is backed by non-swappable memory. The data is only
accessible to the user associated with the unit, via the
User=/DynamicUser= settings (as well as the superuser). [鈥

This will probably be root, at least currently, as we run opensmtpd.service take care of changing to a less privileged user.

In the more long-term, it might be a good idea to go into discussions with upstream of running opensmtpd, as a less-privileged user, and provide the necessary capabilities to bind on low ports, like we do with nginx.

I can confirm that copying to another directory seems to work for me. Thanks!

I'm running into the same issue with PostgreSQL (sanitized):

Oct 26 00:58:36 myhost systemd[1]: Starting PostgreSQL Server...
Oct 26 00:58:37 myhost postgres[19852]: [19852] FATAL:  private key file "/var/lib/acme/myhost.example.com/key.pem" must be owned by the database user or root
Oct 26 00:58:37 myhost postgres[19852]: [19852] LOG:  database system is shut down
Oct 26 00:58:37 myhost systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
Oct 26 00:58:37 myhost systemd[1]: postgresql.service: Failed with result 'exit-code'.
Oct 26 00:58:37 myhost systemd[1]: Failed to start PostgreSQL Server.

We do already run PostgreSQL as the user postgres, and it's unclear if LoadCredential will work for it from the systemd docs. Looking at the implementation it seems to try to use ACLs and otherwise fallback to file system ownership. Might just have to try it once we get systemd 247.

I added some pre-start code to do the "copy certs to a new directory" and it looks like PostgreSQL also cares about the permissions:

Oct 26 01:24:14 myhost systemd[1]: Starting PostgreSQL Server...
Oct 26 01:24:14 myhost postgres[20921]: [20921] FATAL:  private key file "/run/postgresql/certs/key.pem" has group or world access
Oct 26 01:24:14 myhost postgres[20921]: [20921] DETAIL:  File must have permissions u=rw (0600) or less if owned by the database user, or permissions u=rw,g=r (0640) or less if owned by root.
Oct 26 01:24:14 myhost postgres[20921]: [20921] LOG:  database system is shut down
Oct 26 01:24:14 myhost systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE

Handling it in my pre-start code but something to keep in mind when developing a better solution.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rzetterberg picture rzetterberg  路  3Comments

lverns picture lverns  路  3Comments

teto picture teto  路  3Comments

spacekitteh picture spacekitteh  路  3Comments

ob7 picture ob7  路  3Comments