nix-channel --update redownloads when there is no update

Created on 24 Apr 2016  Â·  12Comments  Â·  Source: NixOS/nix

nix-channel --update downloads are not efficient, because it will just download the same file over and over (after multiple calls), even if it is already locally available.

There exist various mechanisms to determine whether a particular file was already downloaded before.

One such mechanism is the -N flag of wget. I don't care about which mechanism is used, as long as it is fixed.

Priority to fix this is low, as it is an optimization.

UX

Most helpful comment

If we can just shift the channels over to S3, then they'd show an etag, which is (almost always) the MD5 of the content.

All 12 comments

The nixos.org channels aren't showing the Last-Modified header so that will also need to be added.

This should be fairly trivial using the "If-Modified-Since" header in the perl line of https://github.com/NixOS/nix/blob/75d2492f20dc513337de3ef2d45e1d5c68c7dff8/scripts/nix-channel.in#L102. But I think that whole script is going to be rewritten in C++ if #341 is ever resolved.

The approach used in this script could probably be reused here.

$ curl -D - http://nixos.org/channels/nixos-16.03/nixexprs.tar.xz -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 302 Found
Date: Mon, 25 Jul 2016 11:23:02 GMT
Server: Apache/2.4.18 (Unix) OpenSSL/1.0.2h PHP/5.6.23
Location: http://nixos.org/releases/nixos/16.03/nixos-16.03.1143.6d520ce/nixexprs.tar.xz
Content-Length: 262
Content-Type: text/html; charset=iso-8859-1

100   262  100   262    0     0   1881      0 --:--:-- --:--:-- --:--:--  2620

We'd need to set last modified header or similar for the apache.

Related issue: If you have two channels pointing to the same URL, nix-channel --update downloads it twice.

If we can just shift the channels over to S3, then they'd show an etag, which is (almost always) the MD5 of the content.

We're now using S3 channel but this is still not fixed. nix-channel --update basically calls:

$ nix-prefetch-url https://d3g5gsiof5omrk.cloudfront.net/nixos/16.09/nixos-16.09beta480.2d463a3/nixexprs.tar.xz
downloading ‘https://d3g5gsiof5omrk.cloudfront.net/nixos/16.09/nixos-16.09beta480.2d463a3/nixexprs.tar.xz’... [0/0 KiB, 0.0 Ki [7951/8557 KiB, 646.2 KiB/s]
path is ‘/nix/store/fm9glgmvjs6ga3k2h202mqq0nsrlr0ll-nixexprs.tar.xz’
161lz9xg38g34qxagxjpk5dii3s1d9441svx6csbiklh3xss1p2d

And the url is fetched each time.

@domenkozar yeah, it won't magically work, but we now have the tools to make it work.

Basically, you'd call a HEAD on the URL, which should return an ETag that corresponds to the MD5 of the object (assuming it wasn't uploaded as a multi-part upload, which we control). We check that MD5, then only download if it changed.

The alternative that requires one fewer request is to use one of the more advanced request headers as specified in http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html. So we'd compute the MD5 locally, then pass in If-None-Match with our locally computed MD5, and if we get a 304 in response, we know that nothing's changed, otherwise we get the new file. I'm pretty sure all that behavior gets forwarded properly through CloudFront, but I've only ever used it myself against S3 directly.

Well, just matching the final URL would be enough for practical purposes, as it contains shortened commit hash.

Anyone working on this? Kind of a buzzkill watching it run a no-op 8729 KiB download on every call to update.

I don't think there's anyone... so it's free for taking ;-)

Was this page helpful?
0 / 5 - 0 ratings