Nixpkgs: Hydra can't evaluate nixos:trunk-combined

Created on 16 Jul 2019 · 16Comments · Source: NixOS/nixpkgs

hydra-eval-jobs returned signal 134:

which means out-of-memory killer, most likely (I think). The ratio of successes has been rather bad several days ago already, but now we seem to be completely stuck (many attempts without any success).

Among other issues, it blocks security updates from getting to the nixos-unstable channel.

blocker security

Source

vcunat

👍1

Most helpful comment

We're going to get a new machine for Hydra.

The new one will run all of the things the current one, minus postgres
The old one will continue running postgres, so it'll be able to use up all the system's RAM for cache
The old one has 32G RAM
The new one has 64G of RAM

We should be able to better evaluate then.

grahamc on 16 Jul 2019

🎉7

All 16 comments

In metrics I can't see any recent anomaly around memory consumption, so I expect it's either something around NixOS stuff (like OS, tests, etc.)... or we just very slowly got over the limit.

vcunat on 16 Jul 2019

I have pinged @rbvermaa and @edolstra.

grahamc on 16 Jul 2019

For the record, it's been weeks since it started happening with regularity.

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS

I've been hapazardly requeuing the eval a bunch lately, due to that. It eventually finishes to eval, but seems like it's getting (anecdotally) harder for it to pass.

samueldr on 16 Jul 2019

And for the record, OOM will not be 134, but signal 9, AFAICT. 134 is boehm GC failing.

samueldr on 16 Jul 2019

👍1

And... well, sorry for adding again, this, or a linked issue, has been stopping us from adding aarch64-linux to supported systems. Figuring a solution to this is likely going to unblock us from doing it.

samueldr on 16 Jul 2019

Oh, right, I miscalculated... 134 is probably abort() (e.g. via assert).

vcunat on 16 Jul 2019

👍1

We're going to get a new machine for Hydra.

The new one will run all of the things the current one, minus postgres
The old one will continue running postgres, so it'll be able to use up all the system's RAM for cache
The old one has 32G RAM
The new one has 64G of RAM

We should be able to better evaluate then.

grahamc on 16 Jul 2019

🎉7

I hope that will really last us for years.

I'm not sure what's the time plan, but in the meantime... remove aarch64 from this jobset? (trunk jobset will still us provide with binaries and regression visibility for the larger parts)

vcunat on 17 Jul 2019

@grahamc any idea when that is approximately going to happen?

FRidh on 17 Jul 2019

We've ordered the server. Hopefully we'll get it today.

edolstra on 17 Jul 2019

🎉2

I hope that will really last us for years.

Years, or the addition of armv6l, riscv, and power9 :wink:

grahamc on 17 Jul 2019

The server took a while to get provisioned, but we now have access to it.

We asked for it to be provisioned very nearby to the current hydra server, to:

1) make transferring the state faster
2) reduce latency from hydra to the postgres database

We suspect this special request made it take a longer than usual amount of time.

grahamc on 18 Jul 2019

👍1

... but hydra.nixos.org hasn't been migrated so far, right? (To be clear, I don't mean that as critique or anything.)

vcunat on 18 Jul 2019

Correct. We only just took control of the chassis, and we'll begin provisioning today. Probably after CEST work hours :)

grahamc on 18 Jul 2019

🎉3

We started transferring GC roots and derivations over to the new server several hours ago.

grahamc on 19 Jul 2019

👀2

The new server is up and running.

edolstra on 19 Jul 2019

❤4

Was this page helpful?

0 / 5 - 0 ratings

Related issues

adding python package in overlay via callPackage

teto · 3Comments

Issue using pandoc and pdflatex

sid-kap · 3Comments

ZSH getent

tomberek · 3Comments

Bash4 or Fish shell

ghost · 3Comments

tcpcryptd should be enabled by default for security.

spacekitteh · 3Comments