Nixpkgs: Problem: tarballs.nixos.org is too helpful

Created on 9 Sep 2018  路  9Comments  路  Source: NixOS/nixpkgs

Issue description

When a source URL changes, but the fixed-output hash remains constant, Nix would normally download the URL again, and verify it against the hash. But with tarballs.nixos.org in place, it will just download the file from the cache, and never validate that it corresponds to the real-world resource behind the URL.

Steps to reproduce

  1. Get a file into tarballs.nixos.org.
  2. Create a fetchurl derivation with the same hash, but an URL that doesn't have this content at all.
  3. See how Nix just gets the content from tarballs.nixos.org and never notices that the URL is wrong.

Technical details

In #45952 we saw that the racket version was bumped, but the sha256 of the -minimal override wasn't, which resulted in producing a racket-minimal-7.0 derivation built from 6.12 source.

If it weren't for tarballs.nixos.org providing the file based on hash alone, hydra would have discovered that the resource behind the new URL didn't match the hash and the mistake wouldn't have been merged.

Having a purely content-addressable cache is obviously an advantage, but how can we get that while still detecting when the cache is too helpful?

Should e.g. the Racket package verify which version it built, by querying racket --version as part of the derivation?

Or is the content-adressable cache a mistake, and getting the paths from fixed-output derivations from hydra is good enough?

Most helpful comment

You seem to be the describing exactly the desired operation of the fixed output hash and the intended purpose of tarballs.nixos.org

If the hash were available locally, we would not even ask tarballs.nixos.org but just conclude that we already have that src, so fetching must be a no-op.

All 9 comments

You seem to be the describing exactly the desired operation of the fixed output hash and the intended purpose of tarballs.nixos.org

If the hash were available locally, we would not even ask tarballs.nixos.org but just conclude that we already have that src, so fetching must be a no-op.

I believe the racket(-minimal) expression should be changed to avoid this; I added a comment at least: d0413d1ac95.

IIRC we've had some discussions around suggestions like including the basename in the hash computation, but anything like this is rather hard to make happen, as compatibility is quite a headache and it would have slight down-sides as well.

I don't think a similar change would happen in close future, and it would belong to https://github.com/nixos/nix (but certainly feel free to continue some discussion in this thread)

That comment will probably improve this specific case -- thanks.

I suspect I am missing some project history here, that there is some specific case or pattern that caused tarballs.* to come into existance. What is that?

I imagine that most derivation builds never go there, as the fetchurl derivation would usually be available on hydra. Am I wrong?

tarballs.nixos.org isn't really the root of the problem here anyway. The fixed-output derivations are (i.e. content-addressed ones). The main motivation is that switching a fetcher's URL or even the protocol shouldn't cause a rebuild/refetch (which would cascade to dependants).

Without it #45650 would have had a build failure because https://mirror.racket-lang.org/installers/7.0/racket-minimal-7.0-src.tgz is a different fixed-output derivation from https://mirror.racket-lang.org/installers/6.12/racket-minimal-6.12-src.tgz, and the former does not return content with the hash 0c565jy[...].

What's the compelling reason for t.n.o and why is it better than what hydra already caches?

The main motivation is that switching a fetcher's URL or even the protocol shouldn't cause a rebuild/refetch (which would cascade to dependants).

If you update the URL of the fixed-output derivation, the hash of the derivation will be changed regardless of t.n.o and all its dependents will be rebuilt, right? But the URL itself will only be queried until the derivation has run on hydra.

It looks like the compelling reason is in bb672805 , that upstreams sometimes change URL structures and that introduces fragility in nixpkgs -- unless the derivation is cached. I guess the issue is that it's currently hard to protect the derivation from getting gc'ed?

I'm just asking the stupid questions here, I know I'm probably missing some important point.

If you update the URL of the fixed-output derivation, the hash of the derivation will be changed regardless of t.n.o and all its dependents will be rebuilt, right?

No, fixed-output-derivations have their hashes based on the name (which is usually simply "source") and the _output_, not the input.

I guess the issue is that it's currently hard to protect the derivation from getting gc'ed?

In a sense. Generally, we have gcroots for eg. hello, not hello.src.

No, fixed-output-derivations have their hashes based on the name (which is usually simply "source") and the output, not the input.

There are certainly source derivations just named source (e.g. fetchFromGitHub derivations without an explicit name), but in the case of e.g. fetchurl, and therefore in the case of e.g. the racket source, the name of the derivation comes from the basename of the URL:

$ nix-instantiate -E 'with import <nixpkgs> {}; fetchurl { url = http://example.com/; sha256="0000000000000000000000000000000000000000000000000000"; }' 2>/dev/null
/nix/store/n9pad0sw3xfycrqzcj1p2kngv3l7kqyw-example.com.drv

we have gcroots for eg. hello, not hello.src.

Yeah, I guess for a small project it's easy to say "just set keep-outputs to true" (which I think would preserve those source outputs, right?), but for nixpkgs that might mean huge disk space needs?

Anyway, I think my point was that t.n.o was unintuitive to me, because I expected fetchurl caching to work like any other fixed-output derivation. I even prepared a patch to make sure racket-minimal.src would change its name when the racket version was bumped, before I realized that the existing implementation already did that and it was something else that threw it off.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danykey picture danykey  路  64Comments

globin picture globin  路  65Comments

tfc picture tfc  路  68Comments

peti picture peti  路  75Comments

worldofpeace picture worldofpeace  路  103Comments