I have a nix-channel --update that's been stuck for three days.
$ ps ax | grep nix-channel
11439 ? Sl 0:29 /nix/store/dj2ylmvx2c288h51yx8ljg4h6bh5vyla-nix-2.3/bin/nix-channel --update
$ beenrunning() { units -v1 "($(cut -d" " -f1 /proc/uptime) - $(cut -d" " -f22 /proc/$1/stat)/100)s" days; }
$ beenrunning 11439
(347589.62 - 8627974/100)s = 3.0244199 days
It appears to have printed this before hanging:
waiting for lock on '/root/.cache/nix/tarballs/0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file'...
Stack trace of the main thread (full trace of all threads):
#0 in syscall () from ...glibc-2.27/lib/libc.so.6
#1 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(...) () from ...gcc-8.3.0-lib/lib/libstdc++.so.6
#2 in nix::Downloader::download(nix::DownloadRequest const&) () from ...nix-2.3/lib/libnixstore.so
#3 in nix::Downloader::downloadCached(nix::ref<nix::Store>, nix::CachedDownloadRequest const&) () from ...nix-2.3/lib/libnixstore.so
#4 in _main(int, char**) ()
#5 in nix::mainWrapped(int, char**) ()
#6 in nix::handleExceptions(string, function) () from ...nix-2.3/lib/libnixmain.so
#7 in main ()
This is the only process that has that file open, and it _has_ the lock (the "W" for write-lock in "9uW"):
# lsof -n | grep 0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file.lock
nix-chann 11439 root 9uW REG 254,2 0 59779377 /root/.cache/nix/tarballs/0s9...-file.lock
nix-chann 11439 11440 nix-chann root 9uW REG 254,2 0 59779377 /root/.cache/nix/tarballs/0s9...-file.lock
nix-chann 11439 11441 nix-chann root 9uW REG 254,2 0 59779377 /root/.cache/nix/tarballs/0s9...-file.lock
nix-chann 11439 11442 nix-chann root 9uW REG 254,2 0 59779377 /root/.cache/nix/tarballs/0s9...-file.lock
nix-chann 11439 11443 nix-chann root 9uW REG 254,2 0 59779377 /root/.cache/nix/tarballs/0s9...-file.lock
nix-chann 11439 11444 nix-chann root 9uW REG 254,2 0 59779377 /root/.cache/nix/tarballs/0s9...-file.lock
This stuck process blocks other invocations of nix-channel --update as well, causing them to emit waiting for lock on '/root/.cache/nix/tarballs/...-file'... and then hang indefinitely, but with a different backtrace:
(gdb) bt
#0 in flock () from ...-glibc-2.27/lib/libc.so.6
#1 in nix::lockFile(int, nix::LockType, bool) () from ...nix-2.3/lib/libnixstore.so
#2 in nix::PathLocks::lockPaths(set<string>, string, bool) () from ...-nix-2.3/lib/libnixstore.so
#3 in nix::PathLocks::PathLocks(set<string>, string) () from ...nix-2.3/lib/libnixstore.so
#4 in nix::Downloader::downloadCached(nix::ref<nix::Store>, nix::CachedDownloadRequest const&) () from ...-nix-2.3/lib/libnixstore.so
#5 in _main(int, char**) ()
#6 in nix::mainWrapped(int, char**) ()
#7 in nix::handleExceptions(string, function) () from ...nix-2.3/lib/libnixmain.so
#8 in main ()
The list of channels it was trying to fetch:
# nix-channel --list
nixos https://nixos.org/channels/nixos-19.09
nixos-hardware https://github.com/NixOS/nixos-hardware/archive/master.tar.gz
nixos-unstable https://nixos.org/channels/nixos-unstable
It looks like the one that got stuck is the 19.09 channel metadata:
# ls -l /root/.cache/nix/tarballs/0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file*
lrwxrwxrwx 1 root root 55 Nov 18 16:04 /root/.cache/nix/tarballs/0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file -> /nix/store/l67ybb9qn0y2l77v4j8jc4q92mb9mbmh-nixos-19.09
-rw------- 1 root root 0 Nov 4 10:32 /root/.cache/nix/tarballs/0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file.lock
The file looks like it was successfully fetched. It's an html file with this inside:
$ lynx -dump -force-html -nolist -width=140 /nix/store/l67ybb9qn0y2l77v4j8jc4q92mb9mbmh-nixos-19.09
nixos-19.09 release nixos-19.09.1282.e6d584f6dd2
Released on 2019-11-18 18:59:01 from Git commit e6d584f6dd22b587d5cdf5019f5e7dd2be370f61 via Hydra evaluation 1555352.
File name Size SHA-256 hash
binary-cache-url 23 eb8d2a51d85521c07092ed8126df19bb6f9d7c16a5275395a0e110868a4ac6c0
git-revision 40 968416ccade5c065ab3f5e29078537b3aa75eb2d1ab999c592d042e38f39bc07
nixexprs.tar.xz 13867856 611606b1c5a7824687b090b42094f6f43386aed4f6016f575146e5a01cf6b338
nixos-19.09.1282.e6d584f6dd2-x86_64-linux.ova 1412923392 f19e98a088a0eaee85de018ab6b6e629b9fdf8d7a64bfa935ed0b8a18764bd18
nixos-graphical-19.09.1282.e6d584f6dd2-x86_64-linux.iso 1343225856 bea34e1c59986967e979ebc981fd4b71263be6ec5de98ad4f89d6e493144b7ad
nixos-minimal-19.09.1282.e6d584f6dd2-i686-linux.iso 558891008 225910221b2506588d6ee7eee118143fd76d6f46b0f72345f27deb4cd1b1d260
nixos-minimal-19.09.1282.e6d584f6dd2-x86_64-linux.iso 564133888 c4358112ba93c4544119770363a955f401a3e6a4194b91920950d9b1c70ce257
src-url 36 ab86d9d20e41a772ee3a08972f0f834fbf7186a10e34cab5417fab973901e5fe
store-paths.xz 1068860 092492469016adff8958e26e6674e2db378eb0d388459ca561fb6791657cfee5
+1, having the same issue right now.
Any other info I should extract from this stuck process before killing it?
$ beenrunning 11439
(1272390.24 - 8627974/100)s = 13.728131 days
While this is a minor problem for interactive use where an impatient user can hit ^C, it's a more serious problem for mechanisms like system.autoUpgrade.enable where this happens in the background unnoticed and updates just stop indefinitely.
@chkno do you know how to workaround the problem and get the upgrade to work again? For me it's a blocker, I cannot do nixos-rebuild --upgrade switch because of that.
After running kill 11439 (the PID of the stuck nix-channel --update) I was able to update channels again.
Upon being killed, the process that had been stuck for two weeks printed this message (the same message it prints if you run it and immediately press control-C):
warning: warning: download of 'https://nixos.org/channels/nixos-19.09' was interrupted; using cached result
error: interrupted by the user
I've removed file it was waiting for:
/root/.cache/nix/tarballs/0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file
Restarted and it works. I have no idea if restarting or removing the file has worked.
Same problem here, already happened twice, preventing automatic updates without noticing.
So far we encountered this problem on two machines, which both had in common that they ran on BTRFS, while our other machines run on different file systems.
@ktor, @chkno, may I ask which file system you are using?
@timor ext4
ext4
Ok, thanks.
It helps to kill a nix-channel --update nixos process.
Right now I think the reason for this problem on my machine is network-related. It seems to happen when I lose/don't have network connection.
Possibly related: #3338 . Possible workaround: set the nix option download-attempts=1
We'll see if it'll be fixed with curl 7.67+ as Elco suggested in #3338
Most helpful comment
After running
kill 11439(the PID of the stucknix-channel --update) I was able to update channels again.Upon being killed, the process that had been stuck for two weeks printed this message (the same message it prints if you run it and immediately press control-C):