Nixpkgs: Hydra build failure is not reproducible with sandbox (missing latex package)

Created on 29 Jan 2018  Â·  66Comments  Â·  Source: NixOS/nixpkgs

Issue description

I thought I fixed the sage build (https://github.com/NixOS/nixpkgs/pull/34291).
It built sandboxed on my and @7c6f434c's machines. But then the Hydra build failed because the latex package xcolor is missing. The package was available on my machine, but should not have been available in the sandbox.

Steps to reproduce

Build sage from https://github.com/timokau/nixpkgs/tree/sage-fix-shebangs-unstable.

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the
results.

 - system: `"x86_64-linux"`
 - host os: `Linux 4.9.78, NixOS, 17.09.2875.c2b668ee726 (Hummingbird)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 1.12pre5873_b76e282d`
 - channels(root): `"nixos-17.09.2875.c2b668ee726, pkgs-unstable-18.03pre126020.931a0b8be80, unstable-18.03pre126508.8ecadc12502"`
 - channels(timo): `"unstable-18.03pre117327.3fe7cddc30"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs`
TeX

Most helpful comment

Not sure but since you're looking for reasons mtimes might be touched in giac, this looks like it touches every file in giac which seems.. possibly related? :)

All 66 comments

Pinging @vcunat as the maintainer of texlive, do you have any idea what might be causing this?

No ideas, off the top of my head. I have never noticed non-determinism in our texlive packaging.

Unsurprisingly, https://hydra.nixos.org/build/68286564 succeeds for me locally (sandboxed NixOS).

$ nix-hash --type sha256 /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1
1029bbe3b4ddd96f2da00a86c13e9ef7a0bd32b33723e9313e73049401f3fd75

Wow, that build was fast! Takes about 3 times as long for me.

Do you have any idea who might know more?

Nit while looking at this: shouldn't most of buildInputs should instead be nativeBuildInputs?

The change in behavior on hydra vs elsewhere is curious!

Do either of you have a log for the successful build? Would be interesting to see where it finds xcolor.

I can reproduce the failure to find xcolor.sty using "nix-shell -A sage --pure" and running latex on a dummy document:

\documentclass{article}
\usepackage{xcolor}
\begin{document}
\title{Hi}
\end{document}

FWIW I think failure is "right", xcolor is part of collection-latexrecommended which isn't included AFAICT. Adding it fixes the dummy document above, haven't confirmed with a full sage build.

The change in behavior on hydra vs elsewhere is curious!

_Hopefully_ this is just a really unexpected change on master between PR
and merge…

Do either of you have a log for the successful build? Would be interesting to see where it finds xcolor.

I do have the log. Of course it never mentions xcolor, and doesn't go in
too much details about latex…

Nit while looking at this: shouldn't most of buildInputs should instead be nativeBuildInputs?

Probably, but I don't know enough about sage to be sure and didn't bother to check manually.

FWIW I think failure is "right"

I agree, xcolor shouldn't be in the sandbox.

_Hopefully_ this is just a really unexpected change on master between PR
and merge…

I just successfully repeated the build from the same commit hydra was building from (57b01b) so unfortunately it seems more complicated than that.

... build parallelism bug such that hydra builders end up building documentation but you don't?? Haha :P.

As a sanity check-- always nice to have a bit of sanity-- can you quickly try and confirm that the dummy document I posted above fails to build in nix-shell for sage? Just want to be sure it's not, somehow, a leaky sandbox situation or something. :)

... build parallelism bug such that hydra builders end up building documentation but you don't?? Haha :P.

That would be quite the coincidence, but its definately very possible.
Should I disable parallelism and make a new PR? I don't like using hydra
for testing, but I don't know any other way to test this hypothesis.

As a sanity check-- always nice to have a bit of sanity-- can you quickly try and confirm that the dummy document I posted above fails to build in nix-shell for sage? Just want to be sure it's not, somehow, a leaky sandbox situation or something. :)

Good idea. It fails for me too:

! LaTeX Error: File `xcolor.sty' not found.

That would be quite the coincidence, but its definately very possible.
Should I disable parallelism and make a new PR? I don't like using hydra
for testing, but I don't know any other way to test this hypothesis.

Well, you could checkout the exact master revision that Hydra tried to
build and build it locally with --option build-cores 1 just to see if
you can get the same failure locally…

Good idea, I didn't consider that building on one thread might cause the error to show up. Build is running.

The build succeeded, still no mention of xcolor.

Now I start wondering if even the Hydra failure is reproducible at all.

I mean, imagine that it depends on whether tex and pdf files get different seconds in the ctime during unpacking…

That's our best theory for the failure, right? Haha I thought I was joking... :grin:

Since it takes forever to build, can either of you post a working log?
Comparing with failed build might tease out its secrets...

GitHub doesn't know of bz2 and refuses to accept it, but agrees to post bz2.log… apparently no mangling happens.

@dtzWill well, technically you said that this is a parallelism bug — but this hypothesis has been tested and experimentally rejected (Hydra doesn't do massive build parallelism, so non-parallel local build should be the same situation). Timestamp races are a different source of FUN (Urist McPackager confirms).

Apologies if I confuse things further but for some reason I can't build this package locally at all. I'm trying to build from a local checkout @ 57b01b1bcf77fc86b82f84e3c0d4904e2464a1b1 as per https://hydra.nixos.org/build/68286564#tabs-buildinputs

It's not deterministic but so far I triggered:

Sadly I have no idea what is it about my system that triggers those or what are the steps to reproduce. But perhaps someone can discern a common pattern that would relate to the error on Hydra?
Also please let me know if I can provide any further information to help debug this (or if I'm doing something dumb and/or it might be completely separate issue).

@pbogdan so, what filesystem do you use and what is its granularity for atime and ctime?

Everything resides on ZFS. I don't know its granularity wrt to atime and ctime. Would you happen to know how I can find out?
I don't suppose any Hydra builder would be running ZFS so perhaps I'm hitting an unrelated bug?

You could do stat for a few files in /tmp and in /nix/store and look at granularity of the values.

My system has quite fine-grained timestamps, it seems.

My build had succeeded on tmpfs for /tmp and btrfs for /nix/store.

Apologies if I confuse things further but for some reason I can't build this package locally at all.

Thats very helpful, thanks for testing! At least it shows that its not hydra specific and maybe you can run some tests instead of having to push it to hydra.

My build had succeeded on tmpfs for /tmp and btrfs for /nix/store.

Same for me.

--

I've compared build logs of my successful build against the hydra one. The failure occurs while building the file cascmd_en.tex. The successful build never touches that file -- instead it builds cascmd_el.tex (without issues).

So it seems to me like sage / giac somehow detects / guesses the system language (even though my system is set to english and I don't even know what language el stands for) and builds different files based on that. Any ideas how the language might leak into the sandbox? @pbogdan, what is your system language? Can you upload your logs from the build where you hit the same error as hydra? Did the error occur while building cascmd_en.tex?

Edit: Also, I patched the sage build to try to build @dtzWill's test document right before building giac. That failed, so the issue doesn't seem to be xcolor leaking into the sandbox.

Apparently the makefile for the english docs is executed in both cases. In the successful case:

[giac-1.2.3.47.p0] Making install in en
[giac-1.2.3.47.p0] make[5]: Entering directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[6]: Entering directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[6]: Nothing to be done for 'install-exec-am'.
[giac-1.2.3.47.p0] /nix/store/9a7psw26gvqk67xiw54y4jlfqyakf241-texlive-combined-2017/bin/dvips -o cascmd_en.ps cascmd_en.dvi
[giac-1.2.3.47.p0] This is dvips(k) 5.997 Copyright 2017 Radical Eye Software (www.radicaleye.com)

...

[giac-1.2.3.47.p0] /nix/store/9a7psw26gvqk67xiw54y4jlfqyakf241-texlive-combined-2017/bin/dvips -o cas.ps cas.dvi
[giac-1.2.3.47.p0] This is dvips(k) 5.997 Copyright 2017 Radical Eye Software (www.radicaleye.com)

...

[giac-1.2.3.47.p0] /nix/store/i0ay05pqkbnvpfijm52mmlrp6kmkl80c-bash-4.4-p12/bin/bash /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/config/install-sh -d /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/giac/doc/en
[giac-1.2.3.47.p0] /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/install -c -m 644 troussesurvie_en.pdf /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/giac/doc/en
[giac-1.2.3.47.p0] /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/install -c -m 644 cascmd_en.ps casinter.ps cas.ps /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/giac/doc/en
[giac-1.2.3.47.p0] for dd in casinter cascmd_en; do \
[giac-1.2.3.47.p0]  /nix/store/i0ay05pqkbnvpfijm52mmlrp6kmkl80c-bash-4.4-p12/bin/bash /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/config/install-sh -d /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/giac/doc/en/$dd ; \
[giac-1.2.3.47.p0] done
[giac-1.2.3.47.p0] for dd in casinter cascmd_en; do \
[giac-1.2.3.47.p0]  /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/install -c -m 644 ./$dd/* /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/giac/doc/en/$dd ; \
[giac-1.2.3.47.p0] done
[giac-1.2.3.47.p0] /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/install -c -m 644 html_mall html_mtt html_vall xcasmenu xcasex keywords  /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/giac/doc/en
[giac-1.2.3.47.p0]  /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/mkdir -p '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/doc/giac/en'
[giac-1.2.3.47.p0]  /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/install -c -m 644 html_mall html_mtt html_vall xcasmenu xcasex keywords '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/doc/giac/en'
[giac-1.2.3.47.p0]  /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/mkdir -p '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/info'
[giac-1.2.3.47.p0]  /nix/store/qd55j183ym04y43aam889xkimq704rdx-coreutils-8.29/bin/install -c -m 644 ./giac_us.info '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/info'
[giac-1.2.3.47.p0]  install-info --info-dir='/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/info' '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/share/info/giac_us.info'
[giac-1.2.3.47.p0] make[6]: Leaving directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[5]: Leaving directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] Making install in es

It then goes on and installs the freshly produced .ps files.

In the unsuccessful case, it apperently tries to build the .dvi files first:

[giac-1.2.3.47.p0] Making install in en
[giac-1.2.3.47.p0] make[5]: Entering directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[6]: Entering directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[6]: Nothing to be done for 'install-exec-am'.
[giac-1.2.3.47.p0] TEXINPUTS=.:$TEXINPUTS hevea -fix casinter.tex 
[giac-1.2.3.47.p0] TEXINPUTS=.:$TEXINPUTS hevea -fix cascmd_en.tex 
[giac-1.2.3.47.p0] TEXINPUTS=.:$TEXINPUTS /nix/store/9a7psw26gvqk67xiw54y4jlfqyakf241-texlive-combined-2017/bin/latex cascmd_en.tex
[giac-1.2.3.47.p0] This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/NixOS.org) (preloaded format=latex)

...

[giac-1.2.3.47.p0] ! LaTeX Error: File `xcolor.sty' not found.

...

[giac-1.2.3.47.p0] Fixpoint reached in 2 step(s)
[giac-1.2.3.47.p0] tdir=`echo cascmd_en.tex | sed -e 's/\.tex//'`; \
[giac-1.2.3.47.p0] /nix/store/i0ay05pqkbnvpfijm52mmlrp6kmkl80c-bash-4.4-p12/bin/bash /nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/config/install-sh -d $tdir ; \
[giac-1.2.3.47.p0] hacha $tdir.html -o $tdir/index.html ; \
[giac-1.2.3.47.p0] touch $tdir.png ; \
[giac-1.2.3.47.p0] cp -f $tdir*.png $tdir
[giac-1.2.3.47.p0] cascmd_en.html:2106: Warning, cannot find anchor: sec%3AGcd

...

[giac-1.2.3.47.p0] touch cascmd_en.stamp
[giac-1.2.3.47.p0] make[6]: Leaving directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[5]: *** [Makefile:590: install-am] Error 2
[giac-1.2.3.47.p0] make[5]: Leaving directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc/en'
[giac-1.2.3.47.p0] make[4]: *** [Makefile:428: install-recursive] Error 1
[giac-1.2.3.47.p0] make[4]: Leaving directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src/doc'
[giac-1.2.3.47.p0] make[3]: *** [Makefile:402: install-recursive] Error 1
[giac-1.2.3.47.p0] make[3]: Leaving directory '/nix/store/315gj57l4vi93jm77yx4xgc0bp4qv509-sage-8.1/var/tmp/sage/build/giac-1.2.3.47.p0/src'
[giac-1.2.3.47.p0] Error installing Giac.

Which fails. So for some reason make thinks its necessary to rebuild the .dvi files. That might be some timestamping issue, but I'm not sure how to proceed.

@7c6f434c poking few files with stat in /tmp and /nix/store shows the timestamps with nanosecond precision.

@timokau unfortunately I didn't keep that log but yes - it failed on processing cascmd_en.tex complaining about xcolor. Ever since I haven't been able to get to that point in the build again but will post a full log if that occurs again. And the system language is set to en_GB.

And I'm also unsure what else to try / test at this point :-(.

nanosecond resolution is what I also see with my BtrFS /nix/store and Ext4 /tmp…

And this being Sage, strace of the entire build is not exactly a feasible solution.

Since we don't have any better ideas, I added a bit of debug output to the build:

  • print all files in the giac with their modified timestamp before installing
  • strace giacs make install call (better than tracing the whole build, but will probably still generate a lot of output)
    I'm currently running that to get a successful sample. @pbogdan, do you think you can reproduce the giac failure with that branch[1]?

I'm focussing on giac since I suspect finding the cause of that one will also solve the others. Or is there some error you can more reliably reproduce so that we can focus on that one instead?

[1] https://github.com/timokau/nixpkgs/tree/sage-debug

And now it failed in almost, but not quite the same, way as it did on Hydra:

9pnv0yd9wdblk0lzjk0w49rik53jax-sage-8.1.drv.bz2.log

(failed to find xcolor twice) 😕 AFAIK I made no changes in my environment that could influence it. I did run a store GC that, as a side effect, cleared out Sage sources and many of its dependencies before triggering this build but that shouldn't matter.

Yeah that looks like the same error. I don't know why its shown twice now, but latex builds are a weird mess anyways.

I'd be great if you could reproduce that with the patched sage to get some more info.

I will give it a try, thank you. So far the most common failure for me is building the singular package, although it seems to fail at different points. Second most common is giac failing to find flex (I assume it's trying to regenerate some sort of a lexer).
Let me try with your branch as is and if needed we can perhaps add more tracing.

It seems like unpacking is setting the modified timestamp, and as far as I know make uses that to determine wether to rebuild something or not:

[giac-1.2.3.47.p0] ./doc/en/cascmd_en.tex 2018-02-03 21:51:27.6741589580 +0000
[giac-1.2.3.47.p0] ./doc/en/cascmd_en.dvi 2018-02-03 21:51:28.6191604780 +0000

So for me the dvi file is newer than the tex file, therefore make doesn't rebuild it. For you its probably the opposite.

But that seems like it should be random -- depending on the order in which the files are unpacked. But it consistently works for me and some others here. Why could that be?

One could expect that unpacking a tar file — a very linear format — should always go in the same order… but real life is, apparently, more interesting.

I guess it makes sense to parallelize that. But I always thought tar archives keep metadata like the modified timestamp intact, isn't that normally the case?

Most recent failure using the sage-debug branch (different .tex file this time but at the same stage as the Hydra failure):

hp3yjphpfdh8p432rb7wmv9szrb9d0-sage-8.1.drv.bz2.log

And indeed seems like something strange is happening with mtimes:

[giac-1.2.3.47.p0] ./doc/en/casinter.dvi 2018-02-04 15:09:06.0048995480 +0000
[giac-1.2.3.47.p0] ./doc/en/casinter.tex 2018-02-04 15:09:06.2788983550 +0000

Thanks!

Apparently tar does indeed keep timestamps -- immediately after unpacking everything is still correct. I'm currently trying to figure out when the timestamps are updated.

There is a fix for this: Just tell make to rebuild everything. I think thats cleaner anyways, even though it'll take a bit longer. Alternatively avoid changing the timestamps.

I'm still interested in whats causing the timestamps to be so consistently the same for me (succeeds every build) and some others here.

Not sure but since you're looking for reasons mtimes might be touched in giac, this looks like it touches every file in giac which seems.. possibly related? :)

Ah, I totally forgot about that (and didn't consider that sed -i probably updates the modified timestamp whether it changes something or not). Yeah thats probably what updates the timestamps. singular is patched with a similar find and sage -i combination, so that explains why @pbogdan gets errors from that.

I'm not sure, but I would guess that find lists the files in a deterministic (recursive alphabetic I think) order and then launches an asynchronous sed for each one. Then I'd guess hard drive speed makes the difference. I have an SSD in my laptop, which is probably why the sed for the tex file finishes before the sed of the dvi. I guess the hydra drive is probably pretty overloaded and the cpu resources constrained. How about you @pbogdan ?

For a fix I could either put a grep && before the sed -i to make sure to only touch relevant files, or go through with just telling make to recompile everything with export MAKEFLAGS="$MAKEFLAGS B". I'm not quite sure wich is better. Recompiling everything is probably more stable and cleaner. But it introduces new build dependencies (which should probably be there anyways) and increases build time.

I would guess that find lists the files in a deterministic

No. Directory entry order, i.e. pretty random, FS-dependent, and all the
other bad things.

Directory entry order, i.e. pretty random, FS-dependent, and all the
other bad things.

Well, mystery solved :smile:

For completeness sake - I'm also on an SSD although I use ZFS' lz4 compression which introduces _some_ additional latency for FS operations. Possibly worse under a CPU heavy build..?

finds non-determinism that depends on file-systems etc. probably is a better explanation than different disk access speeds, might just be the difference between ZFS and btrfs.

I attempted to fix it by telling make to rebuild everything. make took that as an invitation to run into an endless recursion:

make[1]: Entering directory '/nix/store/8ix45r4d97b4d3j0y2ssprnqcywn90fl-sage-8.1/sage-root'
make build/make/Makefile
make[2]: Entering directory '/nix/store/8ix45r4d97b4d3j0y2ssprnqcywn90fl-sage-8.1/sage-root'
make build/make/Makefile

...

make[4906]: Entering directory '/nix/store/8ix45r4d97b4d3j0y2ssprnqcywn90fl-sage-8.1/sage-root'
make build/make/Makefile
make[4906]: fork: Resource temporarily unavailable
make[4906]: *** Deleting file 'Makefile'
make[4906]: Failed to remake makefile 'Makefile'.
make[4906]: fork: Resource temporarily unavailable
make[4906]: Leaving directory '/nix/store/8ix45r4d97b4d3j0y2ssprnqcywn90fl-sage-8.1/sage-root'
make[4905]: *** [Makefile:17: Makefile] Error 2

..

make[1]: Leaving directory '/nix/store/8ix45r4d97b4d3j0y2ssprnqcywn90fl-sage-8.1/sage-root'
make: *** [Makefile:17: Makefile] Error 2
note: keeping build directory '/tmp/nix-build-sage-8.1.drv-16'
builder for '/nix/store/k270pmb5v6jjycrp4hsi4cryxacczqnd-sage-8.1.drv' failed with exit code 2

An alternative would be to manually adjust the timestamps -- probably just setting every timestamp to 0 would work. But instead of introducing yet another hack, its probably better to go with the other option.

I replaced the find calls with grep in https://github.com/timokau/nixpkgs/tree/sage-deterministic. @pbogdan, can you test one more time?

Sure, running a build now.

Works! Succeeded on first try on a machine that couldn't build the package up until this point.

Great! I'll make a new PR and hopefully this time Hydra will agree. Third time's the charm...

That was interesting :smile:

Celebrated too early: https://hydra.nixos.org/build/68773141

This time git fails:

[git-2.11.0] gcc -o mailmap.o -c -MF ./.depend/mailmap.o.d -MQ mailmap.o -MMD -MP  -g -O2 -I. -DHAVE_ALLOCA_H -DUSE_CURL_FOR_IMAP_SEND -DNO_OPENSSL -pthread -DHAVE_PATHS_H -DHAVE_STRINGS_H -DHAVE_DEV_TTY -DXDL_FAST_HASH -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM -DSHA1_HEADER='"block-sha1/sha1.h"'  -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"'  mailmap.c
[git-2.11.0] make[3]: *** [Makefile:1787: common-cmds.h] Error 2

No idea what might be causing that, I'll look into it later.

Okay "later" was now: The actual error is

[git-2.11.0] ./generate-cmdlist.sh: line 32: syntax error: you disabled math support for $((arith)) syntax

Which seems reasonable, since generate-cmdlist.sh has the shebang /bin/sh (before patching) and as far as I know $(( )) syntax is not supported by sh. But why is that error only appearing now, and why only for hydras build?

Does anybody know more about how sh is handled in nix? Probably bash in compatibility mode?

This is fixed by #34628, which adds arith support ("math") among other things. Presumably hydra builders will be using that soon-ish, not sure what plan is there.

But why is the build working locally then? Shouldn't it be the same sh hydra currently uses?

The sandbox sh isn't relied upon unless you're using nixUnstable (2.0), so I'm hoping/guessing it's because locally you're running stable Nix (1.11.x)?

Yes, I'm running stable nix. What does that mean, the sandboxed sh isn't relied upon?
Are there different sh binaries installed?

With 1.11 we bind-mount bash + closure, on 2.0 we use a slimmer busybox-based shell (single binary) instead.

So what gets linked as /bin/sh doesn't depend on the buildInputs at all?

Yes, this is a global Nix setting.

Why? Doesn't that go against the idea of functional package declarations?
Is there somewhere I can read about this?

There are a few issues to read about /bin/sh and system() and their annoying consequences… i don't think there is anything interesting in documentation, though.

I searched through the issues a bit. I think https://lists.gnu.org/archive/html/bug-guix/2013-01/msg00041.html is a good summary.

However I still don't get why nix doesn't simply link stdenv.shell to /bin/sh -- it probably has something to do with bootstrapping?

Back to the issue: before the error, the shebang of generate-cmdlist.sh is patched to /nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh. Doesn't that indicate that normal bash is used here, not busybox?

However I still don't get why nix doesn't simply link stdenv.shell to /bin/sh -- it probably has something to do with bootstrapping?

On the Nix level, there is no guarantees there is a notion of stdenv

On the expression level, there is no way to change /bin

Ah that makes sense, thanks.

My other question is resolved by the way: generate-cmdlist.sh is explicitly called by /bin/sh, making the shebang irrelevant.

Apparently #34628 is in master now. Good news: The build succeeded. Bad news: the output is apparently too big for hydra.

Do I unserstand this[1] correctly and the size of the sum of all output (docs and binary combined) has to be <= 2GiB?

[1] https://github.com/NixOS/hydra/blob/4151be7e69957d22af712dd5410b5ad8aa3a2289/src/hydra-queue-runner/build-remote.cc

See #34940

In general it seems best to separate the big parts into separately built derivations. They are often some kind of data that can be "built" cheaply but take lots of space, i.e. suitable not to have binaries and let them build locally. I don't know if this is the case, and sometimes it's complicated to plug those parts together afterwards (via a wrapper or something).

@vcunat It is Sage… It rebuilds a lot of stuff we already package, but simple ways of forcing it to use the system versions (even when the versions seem to match) lead to cryptic test failures.

I know that, but I still don't expect 2GB of compiled binaries.

I expect that running their documentation build separately will be just as interesting.

I just installed sage-8.1 from nixpkgs-unstable :tada:

The download was only 309M, so I guess it compresses pretty well. 1 minute download vs. 3hr build is definitely a big win.

Unpacked its ~1.7G. Most of that really is binaries and other build results. Some of it is still documentation (of individual packages), I guess the build doesn't perfectly adhere to the settings.

Building the docs individually is possible. I thought about doing that and having sage only create a wrapper that sets the appropriate environment variables to connect docs and binaries. But that requires more time, so for now setting buildDocs = true requires a full rebuild. Maybe I'll change that in the future or maybe somebody else will.

I think @teto is looking into using at least the native singular package (#34724).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danykey picture danykey  Â·  64Comments

edolstra picture edolstra  Â·  63Comments

nico202 picture nico202  Â·  70Comments

7c6f434c picture 7c6f434c  Â·  66Comments

ThomasMader picture ThomasMader  Â·  65Comments