Nixpkgs: Generate Hackage/LTS nix expressions on the fly rather than have them in the repo

Created on 10 Jun 2016  路  24Comments  路  Source: NixOS/nixpkgs

@peti brought up an idea earlier on the IRC channel: if hackage-packages.nix is automatically generated, why does it need to be in the nix repo? Could we instead generate it as a nix derivation which would contain all versions of packages on hackage.

I hacked something together that's very crude, incomplete, and that needs a lot of work before it becomes even remotely serviceable but it illustrates the concept: https://gist.github.com/obadz/347fcf3ef0a86a9dbccbb7b04a80793b

You can inspect the generated nix expressions with:

$ nix-build ./autoHackage.nix -A autoHackageSrc

and you can see that at least one package builds by trying:

$ nix-build ./autoHackage.nix -A test-abstract-deque

Maybe we can move hackage-package.nix out of nixpkgs. We can probably do something similar for LTS releases (also @peti's idea).

Notes:

  1. There's a guard in there so that only packages with names starting in ab are generated. That's just to get rapid feedback while iterating.
  2. If you generate the full hackage package set, it's about 60MB of nix expressions uncompressed
  3. I used cabal2nix but I suspect there are more appropriate tools that I'm just not familiar with to do this
  4. If something like this became haskellPackages, we'd probably need to have enough packages in "static nix expressions" to build cabal2nix or any required tooling written in haskell

cc // @acowley @ttuegel @ocharles

enhancement work-in-progress haskell

Most helpful comment

Regarding import-from-derivation, we could allow substitution (but not building) in read-only mode. So if nix-env -qa encounters import autoHackageSrc, and the output of autoHackageSrc exists in the binary cache, it would be downloaded. Otherwise it would throw an exception (which nix-env should handle in some graceful way). That should cover most users.

All 24 comments

The main thing I'd want to test is whether your autoHackageSrc gets a "binary" cache download. @shlevy did some work on that back in https://github.com/NixOS/nix/pull/52 but it's since rotted.

In principle I'm super duper in favor, and this is the thing that I've been wanting to see for over a year.

A few thoughts:

  1. Does nix-env (the name version, not attribute path version) force this to be generated/downloaded at evaluation time?
  2. To minimize the size of evaluation-time downloads, is cabal2nix tuned to generate as little code as possible for each package? It seems like ultimately there should be a fairly simple "data model" for Haskell packages that would fit into a giant attrset generated by something like cabal2nix, that then gets consumed by support code that isn't dynamically generated. Basically I'd aim to minimize the amount of repetitive code that is autogenerated in a system like this, by factoring out all repetition to library code.
  3. Nix might want to get more explicit support for staging to make the UX more pleasant here: make it clear that before we can proceed with evaluation, we need to download X, then make it clear that we're now resuming evaluation after X has been downloaded, and then display the usual list of drvs and outputs.

cc @edolstra since this is the (kind of) thing I've been bugging him about for a while now, especially around https://github.com/NixOS/nix/pull/52. I'm pretty sure that something resembling this will have to be in a future nixpkgs if we want to keep growing past a certain point.

In my own experiments adapting this method to our Qt and KDE packages, I have discovered a possible fatal flaw. Hydra evaluates Nixpkgs in restricted mode, which prevents fetching during evaluation and prevents evaluating Nix expressions from the store. In other words, Hydra can never evaluate a package with an expression generated this way.

Duh. :disappointed:

Oh no! That is a fatal flaw indeed :-(

So that means Hydra cannot build packages generated with #16005 either.

It seems like the notion of restricted mode is overly strict. I do think this is the right way to do things, but we'll need some improvements in Nix itself before it's fully usable. Ping @edolstra and @shlevy who both know a lot about it.

would that mean changing this line would "fix" this issue? i guess it is there for a reason, but maybe it is time to reconsider it (@edolstra)

I don't think this needs improvements in nix exactly. The question is, do we want hydra to be downloading from arbitrary places on the web or not? If so, we should allow downloads during restricted mode. If not, not.

@garbas restricted mode also restricts arbitrary filesystem access, which we definitely want on hydra (don't want someone to be able to upload a nixexpr that dumps /etc/passwd on some hydra box for example)

Alternatively some mechanism to specify a blessed list of downloads, possibly transformed into hydra build inputs. But then that list would need to be protected more than just general nixpkgs access, or else we might as well just enable downloads in restricted mode again.

The Nix manual states

Nix has a new option restrict-eval that allows limiting what paths the Nix evaluator has access to. By passing --option restrict-eval true to Nix, the evaluator will throw an exception if an attempt is made to access any file outside of the Nix search path. This is primarily intended for Hydra to ensure that a Hydra jobset only refers to its declared inputs (and is therefore reproducible).

If I am correct making an exception for just the Nix store would be sufficient.

No, it's not about accessing store paths in this case, it's about downloads (which aren't documented as part of restricted mode it seems)

Aren't builds sandboxed from accessing local paths anyway?

Sure, but without restricted mode evals aren't, and evals copy paths to the store. So I could do echo ${/etc/passwd} > $out/look-here-is-hydras-etc-passwd in my build command just fine without restricted mode.

Sounds like we need a "network-only" mode

Well, restricted mode exists basically only for hydra, so if we want to allow evals to do arbitrary networking we should just remove networking from being gated by restricted mode.

I guess it really depends on what the original motivation for restricting networking was. I obviously would prefer it to be allowed to enable exactly this sort of use case, but who knows what I'm missing!

Import-from-derivation is not forbidden on Hydra. E.g. the Debian and RPM functions use it. However, it's a bad idea because it can cause a significant amount of building at evaluation time. (E.g. if there is a stdenv change and Hydra evaluates a call to rpmClosureGenerator, then hydra-evaluator may spend a few hours building stdenv locally.)

Import-from-derivation is however forbidden in read-only mode, so it would break nix-env -qa.

There is an orthogonal issue of whether a call to the builtin fetchTarball function should be allowed in restricted mode. Currently it isn't, but it was my intent to allow it when fetchTarball has a hash argument (i.e. fetchTarball { url = http://bla; sha256 = "..." } would always be allowed, but fetchTarball http://bla wouldn't be).

Here's a transcript from a follow-up convo I had with @edolstra on IRC:

[08:57:43]  <copumpkin> niksnut: thanks. It doesn't seem fundamental that builds during evaluation happen locally though, right?
[08:59:38]  <niksnut>   copumpkin: no, actually they may be distributed, but either way is bad
[09:00:20]  <copumpkin> niksnut: to me the ideal in that situation is that the evaluator plans this sort of thing ahead of time (at least with my limited understanding of the matter)
[09:01:01]  <copumpkin> niksnut: i.e., it doesn't just blindly go evaluate rpmClosureGenerator, but evaluates a bunch of things and understands that rpmClosureGenerator will be evaluatable after stdenv is built, so it pauses and does other stuff until stdenv is built
[09:01:22]  <copumpkin> or does that sound ridiculous?
[09:01:48]  <copumpkin> if you just nix-build -A rpmClosureGenerator you get a stdenv build during evaluation, of course
[09:01:55]  <copumpkin> but in isolation that doesn't seem like a bad outcome
[09:02:02]  <niksnut>   I don't see an easy way to accomplish that
[09:02:18]  <niksnut>   except by getting rid of the separate evaluator altogether
[09:02:26]  <copumpkin> in what sense?
[09:02:36]  <niksnut>   moving hydra-evaluator into hydra-queue-runner
[09:02:43]  <copumpkin> oh
[09:02:50]  <copumpkin> is that bad for other reasons?
[09:02:56]  <copumpkin> (I'm clueless if you couldn't tell)
[09:03:24]  <niksnut>   it's a lot of work, and would require a major change to the hydra schema
[09:03:37]  <copumpkin> ah
[09:03:46]  <niksnut>   for example, it would require having build steps that are not part of a build
[09:04:45]  <copumpkin> niksnut: do you at least buy the motivation for this change, if not the practicalities? the way I see it, this would allow us to start partitioning nixpkgs, stop growing the repo uncontrollably, and actually doing autogenerated package ecosystems properly (i.e., they'd still be locked down for a given commit, but we wouldn't have to have all the data ahead of time)
[09:05:35]  <niksnut>   copumpkin: yes
[09:05:58]  <niksnut>   note that implementing hash checking in fetchTarball would be trivial to do
[09:06:01]  <copumpkin> niksnut: my fear with fetchtarball is that it's a small bandaid. We might still need arbitrary tooling to autogenerate the nix expressions for the next stage of evaluation, if not to download the sources
[09:06:26]  <copumpkin> niksnut: this is sort of speaking to my earlier rambling about "stages of evaluation/building"
[09:06:48]  <copumpkin> basically stratification of the evaluation process into distinct steps, possibly with builds interleaved between them
[09:09:51]  <copumpkin> niksnut: or do you think adding the hash check to fetchTarball (and the exception to restricted mode) is enough to kickstart this sort of thing?
[09:14:28]  <niksnut>   copumpkin: either way, there is the downside that it makes nixpkgs no longer self-contained
[09:14:56]  <niksnut>   for example, it might become impossible to evaluate a nixpkgs version in the future if some of the referenced external files disappear
[09:15:13]  <copumpkin> niksnut: as long as the external references are hash-locked, that doesn't seem that bad? it's already impossible to do anything with an evaluated nixpkgs in future if the referenced files disappear
[09:15:45]  <copumpkin> I see some appeal to keeping it self-contained, but to me at least the cons are starting to outweigh the pros
[09:16:51]  <gchristensen>  it also makes it much much more difficult to reason about the repository
[09:17:18]  <gchristensen>  because now your diff is -SHA +SHA which could be the difference of a couple lines, or a mass rebuilding of the entire python infrastructure
[09:17:47]  <copumpkin> sure
[09:17:53]  <copumpkin> but do you think the current thing is sustainable?
[09:18:00]  <copumpkin> like I'd love to add a real java ecosystem
[09:18:10]  <copumpkin> but there's no chance in hell that git would survive a haskellPackages-like approach to that
[09:18:25]  <copumpkin> in fact, we basically can't afford another haskellPackages
[09:18:34]  <gchristensen>  yes I agree with you, copumpkin
[09:18:44]  <copumpkin> even though it's sort of the ideal scenario for an individual package ecosystem
[09:19:22]  <gchristensen>  copumpkin: actually, why can't it survive? there are many stories of monster monster git repositories
[09:20:07]  <gchristensen>  copumpkin: ps: I'd love to hear more about your java ecosystem ... I've been trying to package a gradle program and it has been horrible frustrating
[09:20:08]  <copumpkin> gchristensen: autogenerated code in repositories has lots of downsides, and nobody likes monster git repositories. We also increase the channel size and hinder federated ecosystem builds
[09:20:24]  <copumpkin> gchristensen: i.e., peti has to chew people out periodically for touching his autogenerated code
[09:20:33]  <copumpkin> it makes diffs unusable, et.c
[09:20:39]  <copumpkin> all the usual reasons people hate autogenerated code in VCS
[09:20:48]  <gchristensen>  copumpkin: fair enough
[09:20:57]  <FRidh> niksnut, copumpkin: I think it's therefore important that we discuss now exactly what we do and do not keep in external repo's, and that we agree we keep the external repo's in the NixOS org. That should prevent us from 'losing' any.
[09:21:04]  <copumpkin> Nix also just has such a nice way to manage build artifacts, except when it wants to evaluate its own build artifacts :P
[09:21:12]  <copumpkin> FRidh: sure
[09:21:28]  <gchristensen>  copumpkin: ok I agree with you again

Regarding import-from-derivation, we could allow substitution (but not building) in read-only mode. So if nix-env -qa encounters import autoHackageSrc, and the output of autoHackageSrc exists in the binary cache, it would be downloaded. Otherwise it would throw an exception (which nix-env should handle in some graceful way). That should cover most users.

@edolstra, I like this solution:

Regarding import-from-derivation, we could allow substitution in read-only mode. So if nix-env -qa encounters import autoHackageSrc, and the output of autoHackageSrc exists in the binary cache, it would be downloaded.

hydra.nixos.org would build "auto-haskell.nix" just like any other derivation, and once it has built that derivation, the rest of Nixpkgs can import it (on hydra.nixors.org), too.

The only complication is that we don't have one big fat "auto-haskell.nix" file, but instead we have several hundred small "cabal2nix-foo-x.y.nix" files, where "foo" is some Haskell package and "x.y" is some version number. To get all those files realized, we would have to assign a proper attribute to each of those expressions so that packagePlatforms discovers and builds it. This is not nearly as nice as the case where users can instantiate Haskell build expressions on the fly by writing callHackage "foo" "1.0" {} whereever they please. Unfortunatey, Hydra would not find those derivations and consequently would never build them.

I'm closing this issue since we _can_ generate Hackage expressions on the fly, which is the subject of this ticket. Further discussion of the "how to build on Hydra" issue should take place at https://github.com/NixOS/nix/issues/954, IMHO.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

copumpkin picture copumpkin  路  3Comments

teto picture teto  路  3Comments

ayyess picture ayyess  路  3Comments

matthiasbeyer picture matthiasbeyer  路  3Comments

domenkozar picture domenkozar  路  3Comments