Nixpkgs: Channels not advancing

Created on 18 Aug 2019  路  12Comments  路  Source: NixOS/nixpkgs

Issue description

The nixpkgs-unstable channel does not seem to be advancing despite being in good state.

https://howoldis.herokuapp.com/

blocker

Most helpful comment

I see a couple of stuck jobs (queued) up to 5 days old evals, so I'm cancelling them. I assume that was the blocker.

All 12 comments

I see a couple of stuck jobs (queued) up to 5 days old evals, so I'm cancelling them. I assume that was the blocker.

@vcunat is "cancelling queued jobs" happening often enough to introduce some automation for that? Is there an issue open for Hydra?

Sometimes it happens that I see a job "executing" for several days, which is clearly some kind of stuck state, and not even cancelling will stop the state (but it removes the job from "queued" status). Normally Hydra does apply a timeout and cancels stuff – actually the limit is now often too short for us on some aarch64 builds (ghc, llvm, fortran).

@vcunat I see there that machine d3bcab1f is executing lots of jobs for 1d 6h already. Is that stuck as well?

Also, in case of timeout, is build retried on another, more powerful machine? Is build timeout same for all machines? (yeah, those aarch GHC builds)

IIRC we can increase timeouts per-job from the nix expressions, so I suppose we'll do that for some of the expensive builds, regardless of platform. _We might also mark them as big-parallel, but that won't make much difference on our current Hydra setup._

I found several jobs hanging on the hydra server, and I've since terminated them.

I just verified that the channel advancement tooling is not having problems, so presumably it is just the hydra jobs at the moment.

Yes, the channel did update quite fast after I cancelled the laggards, so that part is probably OK. Some automatic avoidance of these occasional problems (stuck builds) would sure be nice, but it doesn't seem to cause too much trouble so far.

One reason we don't automatically do that is usually it represents bugs in Nix, so I try to capture some stack traces to investigate.

Yay, with nix you can dare to debug in production... I guess? (sometimes?)

It has advanced, nixos-unstable hasn't yet but it looks to be running fine.

It's published.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthiasbeyer picture matthiasbeyer  路  3Comments

ayyess picture ayyess  路  3Comments

copumpkin picture copumpkin  路  3Comments

chris-martin picture chris-martin  路  3Comments

grahamc picture grahamc  路  3Comments