Nixpkgs: Hydra evals of nixpkgs:trunk fail (heap size error)

Created on 2 Aug 2018  路  13Comments  路  Source: NixOS/nixpkgs

For over 24hrs now, hydra evals of nixpkgs:trunk have failed due to what looks (again) like memory-related issues, see https://hydra.nixos.org/jobset/nixpkgs/trunk#tabs-errors

hydra-eval-jobs returned signal -1:
restarting hydra-eval-jobs after job 'AMB-plugins.aarch64-linux' because heap size is at 10737418240 bytes
restarting hydra-eval-jobs after job 'AMB-plugins.x86_64-linux' because heap size is at 10737418240 bytes
restarting hydra-eval-jobs after job 'AgdaStdlib.x86_64-linux' because heap size is at 10737418240 bytes
restarting hydra-eval-jobs after job 'CoinMP.aarch64-linux' because heap size is at 10737418240 bytes
...
blocker

All 13 comments

That's really high! I think we may just have too many packages? Maybe need to remove a few recurseIntoAttrs uses.

Update: evals seem to succeed occasionally (2 in the last 3 days - but they run every 4 hrs), likely depends on the machine running them. Unfortunately Hydra only shows the last evaluation's errors, I can't find stats on how many evals failed and why.

It does look like our repo has reached a critical size.

Unless Hydra's evaluator is lacking physical RAM, I expect this would be fixed by using version with https://github.com/NixOS/nixpkgs/pull/43021

Overall the evaluation on Hydra seems going too slow in the past few days – it easily goes over half an hour. I hope there's not some hidden case of rebuild during evaluation.

Actually the last two successful evals of nixpkgs:trunk took 19280s and 20757s - over 5hrs. That doesn't sound right :confused:

This issue also affects nixpkgs:staging

That error message isn't a failure IIRC, see: https://github.com/NixOS/hydra/commit/0882519b108e8549ae19cac558888d81ff062893 it trades memory for time.

Right, but all the evals end up with the usual, actual heap failure and a failed eval.

looks like nixpkgs:trunk-combined hasn't had an eval succeed in 4 days.

Yes, unfortunately, and I forced lots of attempts in there.

The latest failure looks different, though:

hydra-eval-jobs returned signal -1:
trace: warning: You don't have `system.stateVersion` explicitly set. Expect things to break.
trace: warning: You don't have `system.stateVersion` explicitly set. Expect things to break.
timeout

That looks like it might be coming from hydra-eval-jobset, if I'm not reading the perl completely wrong:

    (my $res, my $jobsJSON, my $stderr) = captureStdoutStderr(21600, @cmd);
    die "$evaluator returned " . ($res & 127 ? "signal $res" : "exit code " . ($res >> 8))
        . ":\n" . ($stderr ? decode("utf-8", $stderr) : "(no output)\n")
        if $res;

Note that 21600s is 6 hours, does that match anything we're seeing?

https://github.com/NixOS/hydra/commit/4dca8fe14d3f782bdf927f37efce722acefffff3 introduced some gradual heap size increase 3 days ago. That seems to have somehow improved the situation for trunk but not for trunk-combined.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Infinisil picture Infinisil  路  146Comments

nico202 picture nico202  路  70Comments

peti picture peti  路  75Comments

timokau picture timokau  路  66Comments

joepie91 picture joepie91  路  102Comments