When I try to install or build anything, it usually downloads a lot of packages, but some others time out on cache.nixos.org (only some of the mirrors seem to be down, so it's always different packages), and I have to retry the build a few times before it succeeds.
Do anything that requires package downloads.
Sorry, but there is not much info here that would allow us to reproduce the problem. The report doesn't even include an error message...
Okay, I gathered some more info:
cache.nixos.org resolves to (among others) server-54-240-184-235.ams50.r.cloudfront.net
Now, try
curl -H 'Host: cache.nixos.org' http://54.240.184.235/nix-cache-info
a whole bunch of times in rapid succession, until an invocation takes multiple minutes to complete. (querying this server this rapidly is what nix does too, and it does eventually have downloads that take multiple minutes to complete).
I suspect cloudfront is doing throttling or firewalling.
Thought I'd add my 2 cents on cache download issue.
I've seen two types of network errors. One that leads to a hang when initiating many downloads with download-from-binary-cache.pl
: I will paste the error message next time I get it.
The other leads to this error:
*** Downloading ‘http://cache.nixos.org/nar/0jlld4di2l1yygx6qnchv5kpvgar91lzqaiih57wh6n0vfwx7lc0.nar.xz’ (signed by ‘cache.nixos.org-1’) to ‘/nix/store/sl866aqxc10aa63pb1clvgxsrv8dywnw-hook’...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 250 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 229 100 229 0 0 1523 0 --:--:-- --:--:-- --:--:-- 1523
/nix/store/skdijd3r067cbhh5dn6ya83cz33and2w-xz-5.2.2-bin/bin/xz: (stdin): File format not recognized
error: unexpected end-of-file
download of ‘http://cache.nixos.org/nar/0jlld4di2l1yygx6qnchv5kpvgar91lzqaiih57wh6n0vfwx7lc0.nar.xz’ failed: No such file or directory
Both are fixed by waiting and retrying.
They occur often enough that I get them about 3 times _each_ on a nixos-rebuild
that downloads the full OS.
Also see #9081
Hang that I often get doing be nixos-rebuild build
:
$ nixos-rebuild build
building Nix...
building the system configuration...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/ayihwm5jssbcawzwnll0p60p2xb46bzg.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/bkk6372sxn426b8lmjd46c8ji584j80i.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/8wc89sw7rb3fx5jvns4fhbpkyg4yq0a3.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/0zh45az50cqdlmk5npzrb25gnvq1di10.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/algs2c6nvrbsc4zsyd17q28sw0kvq537.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/bkb66y2qmvc3f9v03hi643inbh8cx6ga.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘http://cache.nixos.org/adc8zvk7cxw53l2nvp99dv1x35093yw2.narinfo’ after 5 seconds...
[… many more times…]
Sometimes it gets unstuck on its own, sometimes it doesn't.
This happens at the beginning with nixos-install
but is ignored, I wonder if I'm missing something though:
download-from-binary-cache.pl: could not download `httos://cache.nixos.org/nix-cache-info` (Curl error 77)
Which apparently means:
CURLE_SSL_CACERT_BADFILE (77)
Problem with reading the SSL CA cert (path? access rights?)
EDIT: looks like it's this: https://github.com/NixOS/nix/blob/4f3cf06c97cb1f15c74b51b60673a0ed9af0a603/scripts/download-from-binary-cache.pl.in#L461-L469
EDIT: looks like activeRequests shouldn't go past maxParallelRequests which is binary-caches-parallel-connections = 3
in my case, BUT since nixPath
setting in my configuration.nix
IS IGNORED which is why I have to pass it as an env. NIX_PATH, then I can safely deduce that this is also ignored and thus fallback to maxParallelRequests being 25 ... I guess I should add --option
instead! So the bigger issue then is, why are some settings in /mnt/etc/nixos/configuration.nix
ignored!
Why does it do that ?
It seems like it's trying to download EVERYTHING at once, instead of only 3 at a time, serially.
I have set
nix.extraOptions = ''
binary-caches-parallel-connections = 3
connect-timeout = 5
'';
EDIT: sheet:
nix-env
ing my user environment (which contains ~ 7GB) often crashes 5-10 times due to network issues.
I now run it in a shell loop:
$ (echo -n "started: "; date -R) >> nix-env.log; for i in `seq 1 20`; do nix-env -ikA nixpkgs.mainEnv && break; (echo -n "crashed $i: "; date -R) >> nix-env.log; echo RESTARTING $i IN 5s..; sleep 5; done; (echo -n "finished: "; date -R) >> nix-env.log
$ cat nix-env.log
started: Sat, 21 May 2016 14:17:46 +0100
crashed 1: Sat, 21 May 2016 14:18:23 +0100
crashed 2: Sat, 21 May 2016 14:21:37 +0100
crashed 3: Sat, 21 May 2016 14:22:56 +0100
crashed 4: Sat, 21 May 2016 14:24:30 +0100
crashed 5: Sat, 21 May 2016 14:32:36 +0100
crashed 6: Sat, 21 May 2016 14:39:48 +0100
finished: Sat, 21 May 2016 14:44:29 +0100
why are some settings in /mnt/etc/nixos/configuration.nix ignored!
@boobiesinc: I believe nixos-install
runs with configuration specified for the system it runs on, not the one you're trying to install.
I'm not getting this problem anymore. @obadz, you? Otherwise, we can close this.
As of last week I was still getting the error.
Thanks for the tip for connect-timeout
, I'm encountering this issue when attempting to rebuild many things at once... (things that aren't in the caches) and that options helps put an upper-bound on how long things wait. Playing with parallel connections (1,25,150) didn't seem to help but it did change how many times I saw the warning messages :).
I was unable to use the command: nixos-rebuild build -I nixpkgs=/home/cvanvranken/nixpkgs
until I added to nix.extraOptions in my /etc/nixos/configuration.nix
nix.extraOptions = ''
binary-caches-parallel-connections = 3
connect-timeout = 5
'';
This also prevented me from doing a fresh installation of nixos 16.09. Had to copy my vm from an existing installation and edit.
I'm having this problem right now: Trying to install NixOS on my laptop but keep getting cannot resolve cache.nixos.org
.
Here's the error message:
*** Downloading ‘https://cache.nixos.org/nar/1y58zw8bn7mvh2k4q43dsxd3rnvy99951ryl5ab2xff881xmmrq0.nar.xz’ to ‘/nix/store/vdzcirdh61lh6nm4q3pdfndvb7jlj1jd-local-cmds’...
curl: (6) Couldn't resolve host 'cache.nixos.org'
/nix/store/5cpnwwnasypdi7p0av6qbaf52y99gmdz-xz-5.2.2-bin/bin/xz: (stdin): File format not recognized
error: unexpected end-of-file
download of ‘https://cache.nixos.org/nar/1y58zw8bn7mvh2k4q43dsxd3rnvy99951ryl5ab2xff881xmmrq0.nar.xz’ failed: No such file or directory
could not download ‘/nix/store/vdzcirdh61lh6nm4q3pdfndvb7jlj1jd-local-cmds’ from any binary cache
fetching path ‘/nix/store/vdzcirdh61lh6nm4q3pdfndvb7jlj1jd-local-cmds’ failed with exit code 1
fetching path ‘/nix/store/gmfb793zhqii5kybag6wq2240x2wd36i-libcap-2.25’...
killing process 24062
cannot build derivation ‘/nix/store/xdwhjhid75dc3wivlq56mvzj84qjjvvf-nixos-system-chatsubo-16.09.1785.05eb31f.drv’: 1 dependencies couldn't be built
error: build of ‘/nix/store/xdwhjhid75dc3wivlq56mvzj84qjjvvf-nixos-system-chatsubo-16.09.1785.05eb31f.drv’ failed
I tried adding the nix.extraOptions
to my configuration.nix like @cessationoftime did. No luck.
Is there any other workaround I can try?
BTW, I can ping cache.nixos.org just fine:
[root@nixos:~]# ping -c 3 cache.nixos.org
PING d3m36hgdyp4koz.cloudfront.net (52.84.125.172) 56(84) bytes of data.
64 bytes from server-52-84-125-172.iad16.r.cloudfront.net (52.84.125.172): icmp_seq=1 ttl=245 time=156 ms
64 bytes from server-52-84-125-172.iad16.r.cloudfront.net (52.84.125.172): icmp_seq=2 ttl=245 time=199 ms
64 bytes from server-52-84-125-172.iad16.r.cloudfront.net (52.84.125.172): icmp_seq=3 ttl=245 time=227 ms
--- d3m36hgdyp4koz.cloudfront.net ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 156.840/194.371/227.260/28.940 ms
Same problem here. Using wget
to get the files manually works. If cloudfront is being used, investigate whether and when cloudfront ever rejects a request, because that's what appears to happen.
I can imagine the following happens: a "new" user downloads a lot of packages and in response CloudFront thinks it's under attack. It records all the meta-data of the request, puts it in a database and if you do the exact same thing again within some predescribed time, it will cancel those requests.
I might be completely wrong, but this would most likely not happen if CloudFront wasn't in the middle of all of it.
*** Downloading ‘https://cache.nixos.org/nar/0b29cgs0p3n4n1layrb822z67r5pxi0pbap42f5f157awj6b4bba.nar.xz’ (signed by ‘cache.nixos.org-1’) to ‘/nix/store/9vppxhy96ggiys1v8rrk09czzl143ck7-kde-wallpapers-high-resolution-4.14.3’...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404
/nix/store/5cpnwwnasypdi7p0av6qbaf52y99gmdz-xz-5.2.2-bin/bin/xz: (stdin): File format not recognized
error: unexpected end-of-file
download of ‘https://cache.nixos.org/nar/0b29cgs0p3n4n1layrb822z67r5pxi0pbap42f5f157awj6b4bba.nar.xz’ failed: No such file or directory
could not download ‘/nix/store/9vppxhy96ggiys1v8rrk09czzl143ck7-kde-wallpapers-high-resolution-4.14.3’ from any binary cache
For me this was fixed by https://github.com/NixOS/nix/pull/994#issuecomment-234727352 & merged in nixpkgs in https://github.com/NixOS/nixpkgs/issues/17804 — I still get the /nix/…/xz: (stdin): File format not recognized error: unexpected end-of-file
errors but with --keep-going
(or -K
for short) stuff continues to download/build and so I just rerun nix-build
once or twice more to clean up the ones that broke and it eventually completes successfully.
@obadz I am running 1.11.6, which is newer than the version in your comment.
Also spotted in https://github.com/NixOS/nixpkgs/pull/22817#issuecomment-282672220
Brand new user here, can't do anything with nix for now...
Stack build for 'rasa' (the haskell editor, which uses nix)
(Sorry for the humongous output pasted here, but github didn't want it as .txt :cry: )
rasa $ stack build
download-from-binary-cache.pl: still waiting for ‘https://cache.nixos.org/g6farv9l1zvvazv2kwj9hxjpk2kyi8rs.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘https://cache.nixos.org/fv4gbvd4alfjahwaiyrsxi3pb74ppx5q.narinfo’ after 5 seconds...
download-from-binary-cache.pl: still waiting for ‘https://cache.nixos.org/afxsz93q2i1wzi66si26z7mm2ff2ykpg.narinfo’ after 5 seconds...
…
I also was getting those today. It was probably less responsive than usual (nothing has changed on my side).
Unfortunately S3 is currently down.
I've cut your log because it does not contain useful information. :smile:
Ah, that's why.
Yes, cutting that log totally makes sense. I probably couldn't add it as file, because of the same AWS outage...
When I try to access a cache URL via Chromium I'm getting the following response (in case it helps):
CloudFront is currently experiencing problems with requesting objects from Amazon S3.
But basically that's what @fpletz already said :smile:.
(Can't upload a screenshot for some reason...)
Update: For some strange reason e.g. lynx https://cache.nixos.org/fi3nnmwg5cbw4lr0id3yahv812bwivaa.narinfo
works perfectly fine while download-from-binary-cache.pl
and Chromium still don't work (tired at least 10x with each tool). The only difference I noticed is that lynx makes a new DNS lookup every time (but theoretically that shouldn't make such a big difference but it might be related to getting the right A record).
Update2: For me it's working fine now :smile: - The problem was probably related to DNS (DNS updates can take some times to propagate even though CloudFront is using short TTLs - But this wouldn't explain why it suddenly sopped working...).
Update3: From https://status.aws.amazon.com/:
Increased Error Rates
Update at 12:52 PM PST: We are seeing recovery for S3 object retrievals, listing and deletions. We continue to work on recovery for adding new objects to S3 and expect to start seeing improved error rates within the hour.
Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.
Seems like this is an AWS S3 problem and probably not caused by anything we did.
Things are working again!
Met vriendelijke groeten,
Pieter Vander Vennet
2017-02-28 21:32 GMT+01:00 Michael Weiss notifications@github.com:
When I try to access a cache URL via Chromium I'm getting the following
response (in case it helps):CloudFront is currently experiencing problems with requesting objects from
Amazon S3.But basically that's what @fpletz https://github.com/fpletz already
said 😄.(Can't upload a screenshot for some reason...)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/14874#issuecomment-283152485,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABZgbl8XO7SpNMhfyvjF_MTaNjld7ms7ks5rhIRfgaJpZM4IMzZW
.
Is this page update automatically? http://status.nixos.org/
No idea about status.nixos.org, I think it's something @rbvermaa once set up.
Also getting this for the ed package:
*** Downloading ‘https://cache.nixos.org/nar/0i6hi0nny968zg2mi7l6jpz62l8rzfkng1r6742ixllijywjlgvp.nar.xz’ (signed by ‘cache.nixos.org-1’) to ‘/nix/store/8xvkriqhp9f47y6kwciha4wsg1nqfq7p-ed-1.14.1’...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404
/nix/store/9jiq4cgp0c0mkgq4cxnsg7jqw6ca25ma-xz-5.2.2-bin/bin/xz: (stdin): File format not recognized
I'm actually getting a lot of 404s, also for libssh2. Is this related to the S3 outage? Mabye builds during that period couldn't be uploaded to S3 but somehow their metadata (link to it) was still saved in the cache db? (not really sure how it works though)
@desiderius what channel are you on? What's the version of the channel? How are you installing ed?
@domenkozar I was on the newest nixos-unstable branch from nixpkgs-channels (when I'm home I can check for the specific commit), but I think it's that one, since it was the most current this morning (5 hours ago).
I did nix-env -i all -f ~/git/nixpkgs/
all is a meta package in my ~/nixpkgs/config.nix
. I'll try just installing ed when I'm home.
nix-store --realize /nix/store/8xvkriqhp9f47y6kwciha4wsg1nqfq7p-ed-1.14.1
succeeds for me (fetches from cache.nixos.org).
Hm, I can't really explain the 404. It could be a cached response from Cloudfront. However, BinaryCacheStore uploads the .nar.xz
before the .narinfo
, so I can't see how somebody could have requested that .nar.xz
earlier. I can dig in the Cloudfront logs...
I was on 45344fdf193ad8aebcc7e3d4c1c997c8067b7b16. But now it works :confused:
It also downloaded a different file:
*** Downloading ‘https://cache.nixos.org/nar/19hi73szwgv062pnfg96gdb26sh4062qi9k3akq2adnnraf0bvr9.nar.xz’ (signed by ‘cache.nixos.org-1’) to ‘/nix/store/gvqyqdlcwfqldb99q6kzixm5786al1ka-ed-1.14.1’...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 42952 100 42952 0 0 27736 0 0:00:01 0:00:01 --:--:-- 27728
@edolstra Can't you just do a handover of this service to someone else, since it seems clear that the people currently responsible for running cache.nixos.org
are not capable of doing so?
Also, this code can be improved to use the upload
method instead of PutObject
. (See the documentation for upload
for why.)
auto result = checkAws(format("AWS error uploading ‘%s’") % path,
s3Helper.client->PutObject(request));
I couldn't be bothered to look at what checkAws
does, but if it merely checks (as opposed to asserts
), then this method is not guaranteed to work, obviously. So, I would say that the problem is not as to "that it is weird that it doesn't work", but that you failed to show in your source code with any convincing argument that it should work in the first place. If checkAws
does exit the process or abort in some way, then the name should be changed.
@butterflya patches welcome :)
@butterflya You probably should have bothered to look at what checkAws
does, because it throws an exception.
And no, I'm not going to drop S3/Cloudfront because of an occasional 404.
@edolstra Occasional 404? You have 15 participants in this issue, of which more than 5 have seen the issue.
If given all this evidence, you have no inclination to make it reliably work, then I (and many others) know enough.
How many people do you think didn't report anything and simply worked around it by trying indefinitely automatically or just abandoned Nix the first time they saw this behaviour?
How many people do you think are going to report issues in the future when clearly your first go-to response is "It doesn't exist"?
I didn't say to drop S3/Cloudfront. I recommended you to let someone else manage this, because you clearly are not qualified (otherwise it would have worked after a few years of failure, right?). Regardless of whether or not it's a "free" service.
The failure likely isn't with Amazon/Cloudfront; it's with you.
So, what are you going to do? Close this issue and ignore it exists?
I also recommend you to read what I say next time, since you clearly missed my last line. Exceptions are one method of aborting control flow, as you probably are aware of.
Until this is fixed, the website is falsely advertising, which is illegal in certain countries:
Built on top of the Nix package manager, it is completely declarative,
makes upgrading systems reliable, and has many other advantages.
It is not reliable. Far from it, in fact.
@domenkozar That's a valid response. I had expected more from @edolstra.
These one-issue-fits-all always end up the same way: no positive outcome.
Please, open a new issue if you think something can be fixed and please remember to describe exactly what's going on in your case. Pasting just that S3 retrieval failed is not enough, it could happen due to a few dozen reasons.
@butterflya pointing fingers and passive aggressive blaming won't help. It never did. Please calm down and unless you can:
1) suggest exactly and how the current situation can be fixed
2) pay someone to implement a fix
join the club and wait until we pin down these issues.
Half of this thread is already resolved:
1) S3 was down
2) @despairblue had DNS issues
3) @obadz Nix commit fixed it for him
So we're working on it.
What an aggressive reaction from @butterflya! The binary cache is provided completely for free, despite costing real money and lots of work of the community, and outages/problems have seem to be relatively rare in the past few years. Besides, the reliability of upgrades is primarily meant in a slightly different way...
Most helpful comment
Okay, I gathered some more info:
cache.nixos.org resolves to (among others)
server-54-240-184-235.ams50.r.cloudfront.net
Now, try
curl -H 'Host: cache.nixos.org' http://54.240.184.235/nix-cache-info
a whole bunch of times in rapid succession, until an invocation takes multiple minutes to complete. (querying this server this rapidly is what nix does too, and it does eventually have downloads that take multiple minutes to complete).
I suspect cloudfront is doing throttling or firewalling.