Nixpkgs: Make a libpq package

Created on 16 May 2019  ·  16Comments  ·  Source: NixOS/nixpkgs

Right now nixpkgs only has postgresql, which contains much more than libpq.

By offering a libpq package like other distros, the closure size of many packages could be reduced significantly.

Even more so because postgresql currently depends on systemd. And that also means that no application or library using libpq can be built against musl, as systemd relies on glibc-only features and thus does not build with musl.

Tasks:

  • [ ] Make postgresql -> systemd dependency configurable
  • [ ] Add libpq package containing only libpq
  • [ ] Change Haskell packages so that they depend on libpq instead of all of postgresql
  • [ ] Change other packages that can benefit from this (help wanted for the other programming languages!)
closure-size

Most helpful comment

still important to me

All 16 comments

CC @matthewbauer @peti for Haskell.

CC @FRidh for Python, would Python benefit from this?

I thought the postgresql.lib output would do this for us?

@matthewbauer yes for runtime closure size, no for systemd

FWIW, regarding closure size, I considered splitting this up before when I did significant Postgres work last year but ultimately went against it because multi-output .lib is supported for all Postgres expressions -- which does reduce the size a bit and worked well enough for dependent expressions.

That said something containing "only" libpq.so (for example, via an override to create a new expression/derivation) would be much smaller, certainly, but it's not clear that's the best use of work or time -- right now .lib for postgresql.lib (9.6) is barely 6mb and that's mostly due to it containing most of the default server extensions that are shipped with the build system (libpq.so is barely 200KB). Moving these out would probably reduce this quite a lot -- for example, they could just be part of .bin instead.

As for dependencies, well, they're pretty slim too:

a@link> nix path-info -rS ./result-lib
/nix/store/57i8nqp2vljg5h61gvk4p1ar9vzhxayj-libossp-uuid-1.6.2             29565976
/nix/store/5p8qcz6i13ymgf80kimhx2g9pb5r5nia-bash-4.4-p23                   29463504
/nix/store/lhngda6cn951z68v5rlh4dni6bb5sjw1-zlib-1.2.11                    28279616
/nix/store/s5sgkvhy2wdgn7fmsvq47srxjhfp9ld9-libxml2-2.9.9                  29985080
/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib          41456208
/nix/store/xqs95fqkjb1kd102yjv5h5q57gcsafb3-glibc-2.27                     28147008
/nix/store/yglqxz2as4ygai26d2hbfxqpf0shdh4l-openssl-1.0.2r                 31880056

But why do we need them?

a@link> nix path-info -r ./result-lib | grep -v postgresql | while read l; do nix why-depends ./result-lib "$l"; done

/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib
╚═══lib/uuid-ossp.so: …LIBC_2.2.5.GLIBC_2.4./nix/store/57i8nqp2vljg5h61gvk4p1ar9vzhxayj-libossp-uuid-1.6.2/lib:/nix/sto…
    => /nix/store/57i8nqp2vljg5h61gvk4p1ar9vzhxayj-libossp-uuid-1.6.2
/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib
╚═══lib/uuid-ossp.so: …LIBC_2.2.5.GLIBC_2.4./nix/store/57i8nqp2vljg5h61gvk4p1ar9vzhxayj-libossp-uuid-1.6.2/lib:/nix/sto…
    => /nix/store/57i8nqp2vljg5h61gvk4p1ar9vzhxayj-libossp-uuid-1.6.2
    ╚═══bin/uuid-config: …#!/nix/store/5p8qcz6i13ymgf80kimhx2g9pb5r5nia-bash-4.4-p23/bin/sh.##.##  OSSP…
        => /nix/store/5p8qcz6i13ymgf80kimhx2g9pb5r5nia-bash-4.4-p23
/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib
╚═══lib/pgcrypto.so: …LIBC_2.4.GLIBC_2.2.5./nix/store/lhngda6cn951z68v5rlh4dni6bb5sjw1-zlib-1.2.11/lib:/nix/store/yglq…
    => /nix/store/lhngda6cn951z68v5rlh4dni6bb5sjw1-zlib-1.2.11
/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib
╚═══lib/pgxml.so: …2.2.5.LIBXML2_2.4.30./nix/store/s5sgkvhy2wdgn7fmsvq47srxjhfp9ld9-libxml2-2.9.9/lib:/nix/store/xq…
    => /nix/store/s5sgkvhy2wdgn7fmsvq47srxjhfp9ld9-libxml2-2.9.9
/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib
╚═══lib/_int.so: …LIBC_2.4.GLIBC_2.2.5./nix/store/xqs95fqkjb1kd102yjv5h5q57gcsafb3-glibc-2.27/lib.XXXXXXXXXXXXXXXX…
    => /nix/store/xqs95fqkjb1kd102yjv5h5q57gcsafb3-glibc-2.27
/nix/store/w7d4sfgxjpalphg3h9bink47yzhf4f96-postgresql-9.6.12-lib
╚═══lib/libpq.so.5.9: …LIBC_2.3.4.GLIBC_2.3./nix/store/yglqxz2as4ygai26d2hbfxqpf0shdh4l-openssl-1.0.2r/lib:/nix/store/x…
    => /nix/store/yglqxz2as4ygai26d2hbfxqpf0shdh4l-openssl-1.0.2r

bash, libuuid, zlib, and libxml are needed for auxiliary extensions. So we could actually achieve significant reductions here by just moving the server extensions out of the .lib output, which would yield nothing except glibc and openssl, I believe -- but these dependencies should be considered "practically free" since you always need them for anything at all. As a result you're looking at a realistic closure size of < 1mb with that change.

Therefore, I think a much better use of time would be to just slim down postgresql.lib and call it a day. And if that doesn't fix the itch, we can always alias libpq = postgresql.lib somewhere in the top-level and say "we have libpq"

@nh2 Does overriding systemd argument to null resolve the first point?

@nh2 Does overriding systemd argument to null resolve the first point?

I don't know, but I'd rather make this an explicit flag as done in #61581 than using null.

@thoughtpolice Thanks for your analysis, that is good to know.

I think a much better use of time would be to just slim down postgresql.lib and call it a day.

That is not enough for musl support though because with the current setup where we build all of postgresql with --with-systemd, and then throw away everything that's not lib, it it fails to compile already before we can throw away anything.

So in my proposal, a libpq package would be built with enableSystemd = false.

It seems a good idea to me to give that "postgres without systemd build dependency" thing a name, and I found libpq the easiest solution because it solves both problems (buldability and client apps not depending on any postgresql server bits) in one go.

I don't follow the logic here. You want a systemd-less version of postgresql so you can use Musl for your stdenv and built packages. libpq actually does not need systemd, only the .bin outputs do, but they do not work with Musl. So if this was turned off or libpq was recompiled separately, you could avoid it. Therefore, libpq would override enableSystemd = false at all times and everything would link against it, but:

  • This implies another rebuild of postgresql for the separate derivation, which is annoying and tiresome
  • It requires everything depending on postgresql to be refactored to libpq in order to take any effect. Because everything currently implicitly relies on .lib to provide libpq, you must change it -- you can't simply override the postgresql.lib output to point to a completely different expression -- unless you want to rebuild every version of postgresql twice just so every postgresqlXX.lib output can be a completely different, recompiled piece of code.
  • In fact you probably do want to recompile every server version twice anyway with this scheme, because people will probably want a libpq that is consistent with the version of postgresql they depend on. Something smells very fishy about having something like buildInputs = [ libpq postgresql_10 ] where libpq = postgresql96.lib or whatever. I'm sure libpq has also had several API/feature changes between versions, probably making this mandatory.
  • And it doesn't really matter anyway because, in order to use Musl as the stdenv, you have to re-compile everything with musl anyway. So what did the libpq derivation get you, exactly? You might as well just override postgresql to enableSystemd = false and be done with it in this same case.

In other words: why should upstream glibc users, Hydra, Borg, etc always compile a second postgresql derivation with this feature disabled, and a substantial amount of changes to all postgres-dependent extensions? When the musl use case requires you recompiling it anyway.

Put another way -- what use cases does this particular scheme make easier/possible that aren't currently possible, and why is this the way it must be done?

Keep in mind that third parties outside use postgresql.lib as a buildInput as well, so this effectively creates a schism between upstream and any third party uses (now you have to override them awkwardly if you want upstream/a third party to behave the same, etc). Unless you just completely remove postgresql.lib and break downstreams, I guess. (It would actually be worse than that, I think -- and would result in confusing build-time errors, because mkDerivation doesn't consider the lack of a .lib to be a problem, it just assumes the files must be in .out instead, AKA it's not multi-output -- but those files won't be there!)


I'm not convinced this is a good plan yet. IMO it would be strictly better to simply disable systemd support unilaterally if stdenv.hostPlatforms.isMusl == true and simply put that in the default generic expression, and also clean up .lib as I explained earlier. Then:

  • Everyone gets a smaller .lib closure, no matter what
  • No major refactorings are needed across the codebase or to make postgres.lib dependent expressions depend on a new expression, or involve excessive recompilations. Everyone keeps using postgresql.lib
  • Musl users automatically get a version that does not depend on systemd at all, transparently to them.

I may be misunderstanding something, but this seems somewhat fishy as an approach, to me. Also, the whole phrasing of "introduce libpq so I can use musl" seems like a classic X-is-Y problem... The actual problem is not that libpq must be separate -- it's that postgresql currently has dependent inputs that are incompatible for you. That's a much narrower problem to solve with a fix that is substantially easier, and much less invasive.

IMO it would be strictly better to simply disable systemd support unilaterally if stdenv.hostPlatforms.isMusl == true and simply put that in the default generic expression, and also clean up .lib as I explained earlier.

@thoughtpolice What you propose sounds like a good plan to me.

One question though:

Are there situations where some programs may want to use only the postgresql server _libraries_?

E.g. should there be postgresql.lib (being essentially libpq, as you suggest), and e.g. postgresql.server-lib (for programs that use server libs but don't need the binaries)? Or is that an uncommon use case that can be handled later, if somebody needs it in the future?

I am not sure about that in the case of first-party things shipped with PostgreSQL. I do not think there's any reasonable way for someone to interface with/link against those meaningfully. They can only be used by CREATE EXTENSION in the server process.

For certain third party extensions, I know they do ship libraries and headers that can be consumed by other extensions. But that case is handled fine: because those expressions will act normally, have .lib and .dev outputs, and be part of your buildInputs if you need them.

I don't think there are any realistic use cases where someone would want all the server libs, but not the binaries in .bin. And if you did -- you'd probably want the include files anyway, but there's another catch here which is that they are inside .bin too, not in a .dev output. So having a .server-libs output is a net loss: you still need .bin anyway for the header files.


So, if we cleaned up things as I mentioned earlier -- by putting server extensions in .bin -- and someone did want them and did want to use them in some theoretical scenario, we can just say "use the .bin output in your buildInputs and it will work". No need for a separate .server-libs.

But I think that's a particular, rare-or-impossible enough bridge that we can cross that one when (if) we get to it.

I don't think this is a good idea. systemd.lib is only 1.8M and seems like a reasonable thing to have a runtime dependency on:

$ du -sch $(nix-store -qR /nix/store/558zw2lrdiqwgy0i5vq779j700iph882-postgresql-9.6.12-lib)
152K    /nix/store/r4bbg2bjmsvyachy2mn97xri6bcib4lj-zlib-1.2.11
160K    /nix/store/kq8jb3g4xzfbc3d7m818ri3293j2nsy3-libossp-uuid-1.6.2
1.4M    /nix/store/2ch3pwfv40ls6qhdmzwsz9wky9pq6cyl-bash-4.4-p23
1.7M    /nix/store/hsh8afbalh1blsj2s4n8g2dvaj2j5aai-libxml2-2.9.9
3.6M    /nix/store/2wj5dw9sslnqxkznyzp8jhic8ywybs72-openssl-1.0.2r
6.2M    /nix/store/558zw2lrdiqwgy0i5vq779j700iph882-postgresql-9.6.12-lib
30M /nix/store/siks2gcfwx6qwh27m7c5r5lixcr621bd-glibc-2.27
43M total

Splitting packages into multiple parts makes things much more confusing to reason about. Just because other distros do it doesn't mean it's a good idea. Multiple outputs are there to avoid this mess. We should use package splitting as a last result!

On the other hand, disabling systemd support when you are building with Musl seems reasonable. But it would be much better to just fix musl support in systemd.

@matthewbauer FWIW, see #61581 for my proposed changes to @nh2's proposed changes -- which effectively do what you ask (systemd is always available, unless stdenv.hostPlatform.isMusl == true || stdenv.isDarwin).

I also do not think systemd upstream has any interest in building with musl; some googling leads to various patches which seem to be rather invasive, and I think it would be better to avoid cluttering our own systemd patchset with a ~dozen more. It's possible to get a subset of systemd's library code working with musl (a lot of the most useful stuff like socket activation helper code, sd_log etc are all in just a single header file), but anything that truly needs libudev or libsystemd is going to be tricky.

I can think of another good reason of splitting postgresql and postgresql_lib. Currently we have postgres = postgresql96, and we can't bump it because it will bump server version for those, who use postgresql as server. Well... we can bump it, but that will be breaking change for many (last time I've checked there were tens of direct pg server usages through postgresql)...

If we could have postgresql_lib as separate top-level attribute, we could bump it's major version every year, as libpq has backwards compat. Is there any need for libpq9.6 when we already have libpq11?

I wrote "splitting" but in fact I just want a separate top-level attribute for libpq. Which should be libpq = postgresql_latest.lib;. So I'm all for multiple outputs, just libpq deserves a separate reference.

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.

still important to me

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lverns picture lverns  ·  3Comments

spacekitteh picture spacekitteh  ·  3Comments

ghost picture ghost  ·  3Comments

retrry picture retrry  ·  3Comments

langston-barrett picture langston-barrett  ·  3Comments