Nix: Supporting fixed-hash derivations for local files

Created on 19 Aug 2017  路  8Comments  路  Source: NixOS/nix

There's currently already fetchurl, which supports specifying a url and an outputHash; my understanding of how it works is that it follows rules like the following:

  1. If the specified outputHash already exists in the store, skip downloading the file.
  2. If the outputHash does not exist in the store yet, download the file; after downloading the file, hash the resulting file and throw an error if the resulting hash does not match the outputHash.

This is useful logic, as it means that you can skip re-downloading files on every rebuild if you already have them, while still ensuring that the downloaded files are the correct ones. The 'signal' for a file at a URL having changed (requiring redownload) is to change the outputHash, which means the next rebuild will redownload the file. All well and good, so far.

However, in certain circumstances one might want to use local paths instead of URLs for sources; for example, in the case I'm currently trying to accommodate, I have a number of proprietary sources that exist in the same repository as my system configuration, and that cannot be independently distributed over HTTP. These can be accessed through a relative path, relative to the expression in which they are used.

The problem is that when doing something like the following...

mkDerivation {
  # ...
  src = ../sources/foo.zip;
  # ...
}

... the foo.zip gets re-added to the store (or at least re-hashed) on every rebuild. While understandable (it has no other way to know whether the file has changed since the last rebuild), this is slow and obviously undesirable from a performance point of view, especially with large multi-GB sources (eg. for proprietary games).

An alternative option is to use requireFile or similar constructs; however, this requires the user to imperatively and manually add sources to their store, and this cannot (easily or reliably) be done as a part of a regular declarative rebuild process, and is therefore not useful for situations like mine where a full deployment from a declarative configuration repository is desired, without additional manual interactions.

A third option is a rather hacky one:

{ path, name ? "", sha256, ... }:
    let
        derivationName = if name != "" then name else baseNameOf path;
        localPath = builtins.toPath path; # Workaround to prevent automatic store addition
    in
        stdenv.mkDerivation {
            name = derivationName;

            builder = pkgs.writeScript "copy-local-file.sh" ''
                source $stdenv/setup

                cp ${localPath} $out
                chmod -x $out
            '';

            outputHashAlgo = "sha256";
            outputHash = sha256;
            outputHashMode = "flat";

            preferLocalBuild = true;
        }

... which casts the path (which is absolute-ized by Nix on parse time, relative to the expression's path) to a string, which is then used in an ad-hoc builder that copies the file from outside of the store into the store, presumably only triggering if the hash doesn't already exist in the store. However, this doesn't work in any kind of sandboxed or constrained environment, since the builder would run under a nixbld user and very likely not have any access to the out-of-store source path.

In short: none of the available options satisfy the requirements. Every option is either too slow, too imperative, or flat-out impossible in a sandbox.

My suggestion would therefore be to add a fetchfile builtin to Nix that behaves similarly to fetchurl, but for local paths; add a local path to the store, but only if the specified hash does not exist yet, and verify that the local file actually matches the hash.

I'm unclear about the exact implementation, but it would require that the add-to-store operation occur before the derivations are built, similar to the automatic add-to-stores for paths that are used in string interpolations. This ensures that the add-to-store operation will have access to the sources, as the operation occurs outside of a sandbox or nixbld user.

In essence this would be a combination of the syntax and hash checking of fetchurl, and the pre-build add-to-store behaviour of string-cast path literals.

In the long term, it might be worth considering whether it's desirable to have a more generic API for adding pre-build-time fetch* functions like this, or whether fetchfile is sufficient and users are just expected to use something like a FUSE mount to expose other protocols for source-fetching.

Most helpful comment

After discussing the issue with @catern on IRC, we came up with this slightly hacky workaround:

{ path, storeHash, ... }:
    let
        derivationName = baseNameOf path;
        storePath = "/nix/store/${storeHash}-${derivationName}";
    in
        if builtins.pathExists storePath then storePath else path

On the first evaluation, this will return the original provided path (which is syntactically a path literal, not a string), which when interpolated into a string implicitly and automatically results in the content at the path being copied into the store.

On subsequent evaluations, the store path (as identified by the storeHash) already exists due to the copy operation during the first evaluation, and therefore it returns a string containing the store path, essentially 'faking' the path-to-store-path conversion.

This is a bit hacky since it relies on the user knowing the store path (obtained by manually doing nix-store --add /path/to/file once), assumes that the store path generation algorithm never changes, and relies on path-to-string casting in a way that probably wasn't intended. It's probably not a good long-term solution, but at least works as a short-term workaround.

All 8 comments

I think fetchurl knows about file://... urls. Care to test this?

  src = fetchurl {
    url = "file:///path/to/my/file.zip";
    sha256 = "...";
  };

I've already tested that, but it fails for the same reason as approach 3; the fetch occurs on build time, at which time the builder doesn't have access to the source file anymore. That's why the pre-build behaviour of path literals is needed here.

After discussing the issue with @catern on IRC, we came up with this slightly hacky workaround:

{ path, storeHash, ... }:
    let
        derivationName = baseNameOf path;
        storePath = "/nix/store/${storeHash}-${derivationName}";
    in
        if builtins.pathExists storePath then storePath else path

On the first evaluation, this will return the original provided path (which is syntactically a path literal, not a string), which when interpolated into a string implicitly and automatically results in the content at the path being copied into the store.

On subsequent evaluations, the store path (as identified by the storeHash) already exists due to the copy operation during the first evaluation, and therefore it returns a string containing the store path, essentially 'faking' the path-to-store-path conversion.

This is a bit hacky since it relies on the user knowing the store path (obtained by manually doing nix-store --add /path/to/file once), assumes that the store path generation algorithm never changes, and relies on path-to-string casting in a way that probably wasn't intended. It's probably not a good long-term solution, but at least works as a short-term workaround.

Maybe you can abuse the outPath attribute of e.g. requireFile? (Completely untested):

{ requireFile, path, sha256, ... }: let derivationName = baseNameOf path; fakeDrv = requireFile { name = derivationName; inherit sha256; }; in if builtins.pathExists fakeDrv.outPath then fakeDrv else path

"fakeDrv" and "path" will not have the same output path in the store in that case, so that won't automatically work.

I believe that the approach in @joepie91's last comment is the best. The only difficulty is in calculating the store path from a hash of the contents instead of hardcoding the store path, but that can be fixed easily with a builtin and is not an obstacle for now.

Ok, a tested version:
let pkgs = import <nixpkgs> {}; fetchlocal = { path, sha256, ... }: let derivationName = baseNameOf path; fakeDrv = pkgs.stdenv.mkDerivation { name = derivationName; outputHashAlgo = "sha256"; outputHashMode = "recursive"; outputHash = sha256; }; in #if builtins.pathExists fakeDrv.outPath then fakeDrv.outPath else "${path}" { a = fakeDrv.outPath; b = "${path}"; }; in fetchlocal { path = ./bar; sha256 = "1476r2f1nccfi3d6l0yxj5m4xww6irm6x3mhz3ifpzi5nlql1ys2"; }
Demo:
$ echo foo > bar $ nix-hash --base32 --type sha256 bar 1476r2f1nccfi3d6l0yxj5m4xww6irm6x3mhz3ifpzi5nlql1ys2 $ nix-instantiate --eval --strict ./test.nix { a = "/nix/store/153kl4fflwkzdzi6valb6f43sqfprcif-bar"; b = "/nix/store/153kl4fflwkzdzi6valb6f43sqfprcif-bar"; }

As mentioned in the original issue,

The problem is that when doing something like the following...

  mkDerivation {
   # ...
   src = ../sources/foo.zip;
   # ...
 }

... the foo.zip gets re-added to the store (or at least re-hashed) on every rebuild. While understandable (it has no other way to know whether the file has changed since the last rebuild), this is slow and obviously undesirable from a performance point of view, especially with large multi-GB sources (eg. for proprietary games).

The functions on this page have been helpful when dealing with large files that are dependencies for services. Avoiding useless serialization and hashing greatly speed up builds.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ihsanturk picture ihsanturk  路  3Comments

dasJ picture dasJ  路  3Comments

Infinisil picture Infinisil  路  3Comments

bryanhuntesl picture bryanhuntesl  路  3Comments

ericsagnes picture ericsagnes  路  4Comments