Nixpkgs: 18.03 Zero Hydra Failures (generic + x86_64-linux)

Created on 7 Mar 2018  路  53Comments  路  Source: NixOS/nixpkgs

Let's make Impala the best release so far! :car: :deer:

We have: the main jobset starting around 1.5k failures, x86_64-darwin around 0.9k, and the newbie aarch64-linux lagging behind near 4k. The numbers seem large, but one package may kill very many jobs...

Steps to repeat

  • Choose some package from those that fail on Hydra. Hurry before the good ones have been taken!
  • Find a fix. That may mean simply restricting meta.platforms, in case the package inherently doesn't support what it has in there ATM.
  • Typically the package was broken on master already. You can verify that on Hydra - example URL: https://hydra.nixos.org/job/nixpkgs/trunk/bash.x86_64-linux In that case base the fix on master and request backporting in the description of the pull request.
  • Ping this issue from the PR, e.g. /cc ZHF #36453. Do this also if you have some WIP. Alternatively you may just post a note in this issue. If the breakage is specific to darwin #36454 or aarch64 #32326, mention the respective issue instead.

The remaining packages will be marked as broken before the release (on the failing platforms), i.e. at the end of March. /cc @NixOS/nixpkgs-committers, but everyone can fix the stuff.

blocker work-in-progress good-first-bug clean-up community feedback

Most helpful comment

All Haskell related evaluation and/or build errors are fixed. The only failing build that remains is ihaskell, which depends on some qt packages that don't compile: https://hydra.nixos.org/build/70876188. I don't know what do to about those, unfortunately.

All 53 comments

qttools build failure blocks lots of things--looks like a /bin/sh problem or otherwise missing some easy native deps. Seems okay on master but don't see reason why, maybe builder-specific? Anyway worth fixing that as it blocks many other packages.

Other qt-related packages similarly fail, here's qdirstat: https://hydra.nixos.org/build/70737278/nixlog/1 . Builds locally, suspect that machine needs an update to use a fixed/functioning /bin/sh.

@dtzwill can't take a look right now, but I'd rather use patchshebangs in patch phase to fix /bin/sh deps

@dtzWill can't take a look right now, but I'd rather use patchshebangs in patch phase to fix /bin/sh deps

Yes. I meant as way of explaining the failure, esp since it works on master/locally. But we shouldn't rely on /bin/sh at all, that would be best!

The two tests succeeded after some restarts.

libosinfo: cannot reproduce the test failure in https://hydra.nixos.org/build/70734135 . same derivation hash builds ok in sandbox on release-18.03 here.

@xeji restarting build

@shlevy thanks, was OK now

libav_12: error not reproducible, please restart https://hydra.nixos.org/build/70737417

@xeji This is an impurity in the build, it's depending on /bin/sh. Probably the fix is patchShebangs during postPatch

@shlevy thanks, will give it a try

Just in case it helps - qt failures are reproducible on my system (17.09.3083.87c057a9c16 (Hummingbird) using nixStable2 daemon and client) when building from the git revision Hydra attempted. They look to be related to https://github.com/NixOS/nixpkgs/blob/6fcf691545896b278cc8e6961af5db9331656f8c/pkgs/development/libraries/qt-5/5.10/qtbase.patch#L733 which expects command builtin in the shell that will execute that statement. From what I gather that statement eventually gets executed via qmake in https://github.com/qt/qtbase/blob/4ba535616b8d3dfda7fbe162c6513f3008c1077a/qmake/library/qmakebuiltins.cpp#L502 and due to use of popen is ran via sh -c. On my system sandbox sh lacks command builtin. Patching nix to provide a more feature-full sandbox sh appears to work, so does patching the qtbase patch to use bash explicitly.

This however rises the question of how those builds can be succeeding on different builders - I assumed they would all be running 17.09 with nixUnstable and 17.09 doesn't seem to have @dtzWill's patches that provide additional features for the sandbox sh?

This is an impurity in the build, it's depending on /bin/sh. Probably the fix is patchShebangs during postPatch

No. If the same builds works in Nix 1.11 but not in Nix 2.0, we should fix Nix 2.0 instead. Otherwise we have massive reproducibility problem where nearly anything from say, nixpkgs 17.09 or 18.03 can't be built anymore on Nix 2.0!

@dezgeg The build works on Nix 1.11 coincidentally, if the host system has an appropriate /bin/sh. Again, depending on /bin/sh is impure. If you want to make sure old things depending on that impurity work, you have to reproduce the impurity yourself, which can be done with e.g. sandbox-paths.

Note, also, that even if you consider this a bug it has nothing to do with nix 2.0 itself, rather with the sh it is configured to use.

The build works on Nix 1.11 coincidentally

There was nothing coincidental. With Nix 1.1,1 NixOS always configured it to use bash even inside the sandbox, and had done for _several years_.

Note, also, that even if you consider this a bug it has nothing to do with nix 2.0 itself, rather with the sh it is configured to use.

Just because you can theoretically configure it away when building Nix, who does that? The _default_ here is what's wrong.

@dezgeg There is no default in Nix. It's in nixpkgs. That's what you want to "fix" if it's a bug. And the NixOS sandbox configuration did nothing for people outside of NixOS, and it caused real problems: https://github.com/NixOS/nixpkgs/issues/1424.

The answer is not to depend on /bin/sh in builds. I wish we didn't even include it in the sandbox, but I lost that fight already.

Actually there was a default in Nix, and it was bash: https://github.com/NixOS/nix/blob/1.11-maintenance/src/libstore/build.cc#L1925

(an alternative answer would be to let stdenv determine what counts as /bin/sh in the sandbox).

@dezgeg That default basically only made sense if building from nixpkgs (otherwise it's unlikely you'd have BASH_PATH pointing to the nix store), so again this is a question of the nixpkgs config.

@shlevy @dezgeg I think this discussion should be taken to a nix issue ticket instead.

More discussion for /bin/sh here: https://github.com/NixOS/nixpkgs/pull/36669

haskellPackages.changelogged is fixed on master but still needs to be backported to 18.03

FYI I started fixing aspino, however this seems to be a bit more difficult as they patch another lib on upstream and their patches don't apply properly against the library versions we use at nixpkgs. Currently evaluating possible solutions (just spamming here to avoid duplicated work on this^^)

All Haskell related evaluation and/or build errors are fixed. The only failing build that remains is ihaskell, which depends on some qt packages that don't compile: https://hydra.nixos.org/build/70876188. I don't know what do to about those, unfortunately.

Great progress in the first week :tada: especially on x86_64-linux where we're under 500 failures now. At this rate this platform might get close to zero soon. Darwin failures were reduced by more than a hundred, which isn't bad either.

BTW, we now have the three usual 18.03 channels.

@vcunat @fpletz thanks a lot for your work! Really looking forward to the new release!

For fails -= 2 please pick #36676 and #36935.

@xeji done

s3ql: version update in e3db2501f9ec fixes build, please pick.

Hydra evals of the release-18.03 job have failed today:
Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
Edit: nevermind, now it's down altogether.

Lately Hydra evals of large jobsets often run into various limits. Fortunately the failure rate isn't too high ATM.

To fix nixos.tests.vault, please pick 3aa3738bb2582f9142675c952f7e6e3621081c1e from master.

37717 fixes the build of qpid-cpp and is now ready to merge.

A few of the fixes mentioned above (plus some others) were merged into master but have not made it into 18.03 yet according to latest hydra build. As the release date is near, here's a little checklist to help ensure no existing fix is accidentally forgotten. Everyone please feel free to edit/check off.

  • [x] godot #34971 -> #37937
  • [x] gosmore #37285 -> #37938
  • [x] kmsxx #37653 -> #37939
  • [x] mythtv #37348 -> #37940
  • [x] notary #36804 -> #37941
  • [x] opal #37776 -> #37942
  • [x] s3ql https://github.com/NixOS/nixpkgs/commit/e3db2501f9ec3271d8dd0c76ea7938650cee5589 -> #37944
  • [x] telepathy-gabble #37702 done but hydra appears to cache old failure of source tarball

Edit: Thanks @srhb for the super quick response - everything done in record time!

Looks like 0d20e7db5ba8f563db35752da51976c8181be1df fixed telepathy-gabble. Marking that done.

... yes, looks like hydra is somehow still caching the old failure after 3 days. hope it will sort itself out.

I just opened a PR (#37948) for an openssl version bump that should probably go into 18.03 some time soon. Since that triggers a mass-rebuild and would probably cause some delays with other stabilization efforts I'd like to hear some feedback from our release team (@vcunat @fpletz) about when and where to merge that into (staging-18.03?).

@andir These kinds of minor bumps should be pretty harmless even though they are mass rebuilds. People should use channels or the nixpkgs-channels repo anyways instead of using the release-18.03 branch. So pushing to the 18.03 branch is fine.

@fpletz we have staging-18.03 (and a jobset) for mass rebuilds... I definitely build from git quite a bit.

currently working on fixing frescobaldi (and switching to poppler_qt5 as the qt4 variant was abandoned) on the following branch: https://github.com/NixOS/nixpkgs/compare/master...Ma27:cleanup-frescobaldi

It requires some improvements (I'd like to have working test suites) before filing a PR

Many regressions in https://hydra.nixos.org/eval/1444338 . Looking for the cause.

Edit: Weird. Tried to bisect this for mailutils as an example. Could reproduce the failure initially but it disappeared during the bisect process and is no longer reproducible. Guess this is caused by some underlying implicit dependencies. Looking at what changed between 54c76d597f6e7eac9129b5b6d129e3759f0440aa and 4b148bce243d66b4b0fd572ff4ad40fb743d5260 that might affect mailutils, my gut feeling is it could be the openssl update in b6474a3a3b9288bf1bb59142ddfdcce48ae68779 .

Does anyone have an idea on how to analyze this further? In the meantime, restarting the failed hydra builds may help.

... yes, looks like hydra is somehow still caching the old failure after 3 days. hope it will sort itself out.

There's this "problem" for fixed-output derivations. The hash didn't change, so it isn't re-attempted.

There's this "problem" for fixed-output derivations. The hash didn't change, so it isn't re-attempted.

Luckily, that particular case was solved by an auto-bump from ryantm's script in the meantime...

Could someone please backport #38016, #38015 and #38002?

Pushed by Dezgeg.

monero-gui broke when monero was updated in 28c00f8f3b4b1f619a7c534767a5ef515ac42ee5.
They haven't created an official release yet so I'm trying to build master in the meantime but I run into some issues.

The remaining packages will be marked as broken before the release (on the failing platforms), i.e. at the end of March.

@vcunat @fpletz what's the current plan/timeline for this?

I thought I could get to pass through them during this week, probably on weekend, including a check that they fail on master as well, but certainly feel free to do it (anyone) and send a PR (master + pick to 18.03 later). In my case it can also easily happen that something more urgent appears.

Thanks. I'll start a PR tonight and incrementally add to it. Feel free to take over any time.

While preparing #38702, I found some packages that don't currently build on hydra but seem to require further investigation. Here's a (dynamic) list for reference:

Done
  • [x] commandergenius
  • [x] coreclr: ok after version bump
  • [x] darwin-opencflite: fixed in #38329 , not yet merged
  • [x] extundelete: fixed in #38954
  • [x] haskellPackages.fsnotify-conduit: master has hydraPlatforms=platforms.none
  • [x] haskellPackages.powerqueue-distributed: master has hydraPlatforms=platforms.none
  • [x] haskellPackages.rocksdb-haskell: master has hydraPlatforms=platforms.none
  • [x] hpx
  • [x] kdeApplications.mbox-importer: builds locally but fails on hydra (log limit exceeded). (39018)
  • [x] mailutils: 1 test case fails sporadically on hydra, not reproducible. Perhaps fixed in #38708 .
  • [x] opendylan
  • [x] openshift
  • [x] python36Packages.bayespy: ok after numpy bump
  • [x] python27Packages.graph-tool
  • [x] pythonPackages.pygame_sdl2
  • [x] python36Packages.pyechnoest
  • [x] salt: unmerged fix+bump in #35891
Failure not reproducible
  • [x] buildbot ok now, looks like the failing hydra job was restarted
  • [x] consul-alerts ok now, looks like the failing hydra job was restarted
  • [x] gitAndTools.git-annex-remote-b2 ok now, looks like the failing hydra job was restarted
  • [x] python27Packages.pyparted ok now
  • [x] ispc
Open
  • [ ] haskellPackages.hw-prim-bits: seems to compile but tests fails
  • [ ] libminc: compiles but one test case fails
  • [ ] libretro.mame: fails on 18.03 but same version succeeds on master
  • [ ] python27Packages.google_api_core: builds on master after a version bump as part of a large batch of updates, not sure what needs to be picked for 18.03
  • [ ] python27Packages.moinmoin: tests fail, same version builds on master
  • [ ] pythonPackages.pytorch: looks like it might build with gcc6

The next ZHF iteration: https://github.com/NixOS/nixpkgs/issues/45960

Belatedly, thanks for all the fixes!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ob7 picture ob7  路  3Comments

grahamc picture grahamc  路  3Comments

langston-barrett picture langston-barrett  路  3Comments

matthiasbeyer picture matthiasbeyer  路  3Comments

rzetterberg picture rzetterberg  路  3Comments