Nixpkgs: Darwin bug on Hydra

Created on 18 Aug 2019 · 12Comments · Source: NixOS/nixpkgs

Describe the bug
The following build has been failing for 2 weeks on Hydra. https://hydra.nixos.org/build/98372621

@LnL7 and @grahamc were not able to reproduce it.
https://github.com/NixOS/nixpkgs/pull/66381#issuecomment-521007623
https://github.com/NixOS/nixpkgs/pull/66381#issuecomment-521015246

I've merged staging-next into master because the longer we wait the worse the merge conflicts get. But, this is going to block the channels from advancing.

bug channel blocker darwin

Source

FRidh

👍2

All 12 comments

We were able to reproduce it when running it in a loop... but only sometimes. Ugh.

grahamc on 18 Aug 2019

On the Hydra machines it seemed to fail pretty reliably – I tried a few restarts to work around it for now, but without any success, I believe... but such things do happen with impure problems.

vcunat on 18 Aug 2019

After some debugging I was able to reproduce, but this issue seems to have been present since the stdenv uses llvm 7. The occurrence rate just meems much higher, 9/10 failures on the hydra hosts compared to 1/30 on my machines. The following can be used to trigger this locally.

with import ./. {};

darwin.CF.overrideAttrs (drv: {
  buildPhase = ''
    for i in {1..512}; do
        rm Build/CoreFoundation/Base.subproj/CFRuntime.c.o || true
        ninja -j$NIX_BUILD_CORES || exit 100
    done
  '';
})

One of the flags in here might be the culprit, but I don't have time to investigate further. CFRuntime-28db57.txt

What should happen in cases like this and what "supported platforms" mean for the status of staging merges, etc. is probably something that should be formalized. Since this has impact on unstable as well as merge requests (ofborg) I think merging large staging failures shouldn't be done lightly IMHO.

LnL7 on 18 Aug 2019

I'm not sure how, but Hydra eventually did succeed with that particular build, and it's already caught up rebuilds of everything but haskellPackages (ghc always takes veeery long to build).

vcunat on 20 Aug 2019

The problem apparently won't just go away by itself – on today's master (after a hash change) I had to restart the job several times to make it succeed.

vcunat on 24 Aug 2019

😕1

Could this be an OOM error? We can try setting enableParallelBuilding = false for it to see if it goes away. For reference, this looks very similar to this bug report:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200342

matthewbauer on 26 Aug 2019

I doubt it, the snippet to reproduce only rebuilds one object repeatedly so the cores shouldn't matter.

LnL7 on 6 Sep 2019

What's the status of this? It won't block 19.09, since there's a separate darwin channel for that, right? I'm removing it from the milestone based on that assumption, but feel free to correct this.

lheckemann on 4 Oct 2019

Correct. Occasionally it appears when a stdenv rebuild happens – I triggered some Hydra restarts because of that a week or two ago.

vcunat on 5 Oct 2019

Issue occurred again now. A third of darwin got built, so I am merging it (for the same reason as I put up before).

FRidh on 24 Oct 2019

👍1

From the other ticket where I wanted to disable building of swift-corefoundation.

BUT, I don't think this PR helps as-is. The problem of the breakage is that so many packages depend on this package ATM.

CoreFoundation is part of the stdenv so this is kind of pointless.

Right. If it means the stdenv and thus none of the Darwin packages build anymore on Hydra, then unless we're going to fix it, then removing it from the Hydra job is exactly what should be done IMO. What's the point of attempting to build it if we know it is going to fail 9/10 times, and in effect blocks channels from advancing?

The only workaround I can think of at the moment is retrying the build phase similar to my reproduction snippet, but that's pretty horrible.

Sounds indeed horrible, but unless another solution can be thought of, it seems to me to be the only way forward to support Darwin. We cannot continuously monitor and press restart while it is blocking staging-next and channels.

FRidh on 17 Nov 2019

I was hoping somebody else would have come up with a better solution, I'll poke a round a little more and make a PR with a workaround.

LnL7 on 17 Nov 2019

🎉1

Was this page helpful?

0 / 5 - 0 ratings