Let's make Jellyfish the best release so far!
We have: the main jobset starting at 425 failures, x86_64-darwin at 829, and aarch64-linux at ~1120. The numbers may seem large, but one weird trick appropriate fix may fix many at once.
meta.platforms
, in case the package inherently doesn't support what it has in there ATM.The remaining packages will be marked as broken before the release (on the failing platforms), i.e. at the end of September. /cc @NixOS/nixpkgs-committers, but everyone can help out!
Reverting https://github.com/NixOS/nixpkgs/commit/ad47c381bda2d38cddb96e15efd4ea5b4836f542 will fix nixpkgs.perl*Packages.MouseXGetOpt
Reverting ad47c38 will fix nixpkgs.perl*Packages.MouseXGetOpt
reverted in 9889c0f2417fe38016ccf8cf126e5b2a9f561f91 and 4c00a04f472d94cb4e5fa366500676ab5e17acca
Failures report as of right now.
Let's see how useful this is as a format. This was queried from the last finished eval, there were evals running while this was made.
Failures report as of right now.
I assumed that perl52[68]Packages.TestMagpie
should be ignored as broken because it depends on broken perl52[68]Packages.UNIVERSALref
(https://github.com/NixOS/nixpkgs/pull/45983).
Or should each dependent have its own meta.broken = versionAtLeast perl.version "5.26"
?
This was queried from the last finished eval, there were evals running while this was made.
@volth the table shows an earlier eval where UNIVERSALref wasn't marked as broken yet, see the build logs, it is fine in the latest eval: https://hydra.nixos.org/eval/1477017#tabs-removed
Or should each dependent have its own meta.broken = versionAtLeast perl.version "5.26" ?
No, only the package that is broken itself.
I'd add that broken
and other checks are transitive during evaluation (implemented as exceptions).
The sage failures are due to a mistake I made when adding pkg-config aliases to openblas and the recent numpy update. I fixed openblas in #46016 in staging. I don't know if that means it will also be merged into 18.09. I haven't gotten to backporting the numpy upgrade from sage upstream yet.
I really think it is a shame that hydra doesn't ping maintainers on failures anymore. Seems like an essential feature to miss.
@vcunat @samueldr there are a number of changes currently in staging/staging-next that should go to 18.09 once they reach master - openblas, texlive 2018, a systemd bugfix, etc.
What's the workflow for these? Guess we'll need a staging-18.09
branch + Hydra job.
Since the fork point the staging
branch won't get to 18.09 anymore. Cherry-picking should be done if desired. For this one I did it in 6f8e07ac0.
@xeji: we have staging-18.09
already, I think we'll add a jobset soon as well to make ZHF easier. (People don't want to rebuild everything when fixing other packages.) This openblas commit wasn't a mass rebuild according to my measurements, so I pushed it directly.
Reminder: we should choose the changes a bit more carefully than for usual staging, so that real stabilization happens :-)
8fb90de88cd (in staging-18.09
) fixes a regression introduced with systemd 239, and makes the systemd test pass.
Update: Fix is in release-18.09
now, done.
I've released stack2nix 0.2.1 to address version bump, once haskell packages are synced it should compile again.
boost162
is fixed by c70ff28968a and 11c2595e40a (in staging-18.09
, mass rebuild)
Update: fixes are in release-18.09
now, done.
I removed the Copperhead kernel yesterday, so those kernel modules should no longer fail :) The whole kernel is unsupported and unmaintained.
New night, new report!
This time, I have:
👍 for the work in deflating those numbers! Can't wait to see the staging work hit too!
Just seen that the statsd
test I'm responsible for broke. Will have a look at this during the day...
@ma27 the test isn't broken, statsd has a runtime error since last nodePackages update, see #45946
ah, good point. I guess that the discussion at https://github.com/etsy/statsd/issues/646 confirms this.
Let's discuss about further steps regarding the package in your ticket :)
Is there a convenient way to backport a PR using github or do I have to push it direclty through the cli / create another PR? Specifically I want to backport #46099.
Is there a convenient way to backport a PR using github
Not that I know of. I git cherry-pick -x
the commits on top of release-18.09
on my local clone and git push
the updated branch.
No need to create a new PR just for cherry-picking.
Things got better (mostly)!
Still a couple of jobs queued, latest evaluation as of right now.
Platform | PREV | NEW | P-N |
---|---|---|---|
i686-linux | 45 | 52 | -7 |
x86_64-linux | 326 | 296 | 30 |
x86_64-darwin | 806 | 797 | 9 |
aarch64-linux | 1098 | 707 | 391 |
(queud) | -- | 13 | -- |
(Hand-crafted table, but thinking about integrating this in the reports.)
i686: the state of that platform hangs primarily on https://github.com/NixOS/nixpkgs/issues/36947#issuecomment-416372860
I added staging-18.09 jobset (low priority).
Tonight's report.
Numbers are going down! Great!
New report!
With more data!! Builds with dependency failures now present the dependencies that failed in their cell. You can easily gauge how much trouble a single package can fix. E.g. you can see the multiple perl Module-Build-XSUtil
failures (per perl version) cause 103 failures on Darwin! I'm thinking as the next improvement that I should index those somehow so we can have a direct list of big hitting fixes.
The Haskell safe-money
package was split into 4. If we bump the version of safe-money
all the build failures of safe-money-*
should be fixed! I don't have a computer at the moment but I thought I'd describe the issue so someone can get another 5 packages building in 18.09 :)
@samueldr nice report! Can you also add as there a comment, where is the report generator script located?
@danbst ~/Projects/nixos/review-tool/eval-report
; nowhere public yet, it grew from hacks into something neat quicker than I anticipated. I'll be looking into properly making it a project so it can be usable. Ring me again if by this week-end I haven't released something yet.
Anyone seeing issues with uv-errno.h: No such file or directory
or similar during this ZHF egg^W bug hunt, libuv moved their headers around in the 1.20 → 1.21 transition. It probably broke more things than what I found.
Right now I see that nixpkgs.lispPackages.cl-libuv.x86_64-linux
nixpkgs.lispPackages.wookie.x86_64-linux
is having issues, pinging @7c6f434c for a lisp ecosystem update?
EDIT: hijacking comment. The last evals all still have over 100 builds running, so here's a late run of yesterday's report. I will generate it in the morning if it's done or close to done.
wookie
and some other packages are broken by dependence on cl-libuv
. cl-libuv
was fixed in https://github.com/orthecreedence/cl-libuv/pull/15 which is a part of 2018-08-31 quicklisp release (and probably the previous one too). I did not manage to regenerate Lisp packages in Nixpkgs with it after making some progress: quicklisp-to-nix-system-info
seems to be unable to process css-selectors-stp
(infinite loop?).
Yeah, that's why I was pinging @7c6f434c, the name pops up frequentily near cl-*
stuff. I was also sharing the findings about the missing header file, in case other failures have the same error. (And ah, must have misread something, I thought wookie handled libuv by itself, from a quick reading.) I have no experience with anything lispy :).
Thanks for the reminder. I am not sure there was a Quicklisp release between 2018-07-11 and 2018-08-31 (also fun: the Quicklisp web page misses the August release).
(I hope I won't forget to cherry-pick the update to 18.09 when it succeeds)
(It's a pity GitHub doesn't indicate when previously-mass-mentioned issue has a new targeted mention, not just an arbitratry update)
(It's a pity GitHub doesn't indicate when previously-mass-mentioned issue has a new targeted mention, not just an arbitratry update)
I'm doing this in my e-mail box by a filter for my explicit mentions :-) GitHub's notification system doesn't really fit me fully.
If the header move affects many packages, we might consider compatibility symlinks in our libuv package.
I'm doing this in my e-mail box by a filter for my explicit mentions :-) GitHub's notification system doesn't really fit me fully.
In email I try to skim all the headers/first lines of the messages, and then I go to GitHub to check if I missed some mentions (because I try to keep track of the general direction of things while not spending too much effort). This almost works. Oh well, I don't expect GitHub to work completely…
Is there some way to dismiss the headers you've skimmed already? I like that discourse allows you a workflow where you look at the list of new topics, and you skim that and decide which of them you want to follow (and then none of them will show up in "new topics" for you anymore).
If the header move affects many packages, we might consider compatibility symlinks in our libuv package.
Or maybe a wrapper package — less rebuilds, more explicit record-keeping.
That would require what, looking at the failing packages, selecting the ones with libuv dependencies, grepping log for uv-errno then letting a human do the final triage?
Is there some way to dismiss the headers you've skimmed already?
Wouldn't expect GitHub to have that; then again, if some discussion in Nixpkgs keeps going on, I start wondering if there are some useful tangents in that discussion.
I didn't see such a possibility with GitHub. Discourse still has "latest topics" tab, so I do glimpse at that as well :-) (it highlights the ones I've read in full already, etc.)
Hi! Some good improvements compared to the previous report!
NOTE: my misunderstanding of a hydra feature may skew or break dependency failures. (Restarted jobs, such as happy
could still show up as a cached failure.) Next reports will either omit the section or work from another data source.
pythonPackages.rlp
, one of the "problematic dependencies" in the report. Needs backport.It seems that nixpkgs.julia_10.x86_64-linux
and nixpkgs.julia_07.x86_64-linux
should now build properly. The last hydra failures were for a timeout. I can build both versions locally on the 18.09 release branch.
Hi! Reporting time. It has been pointed out to me that dependency failure resolutions will show false positives (things that ended up not failing). I'm working on fixing it, but there is still value in the data, even though some of them are not broken. If you're using the "Problematic dependencies" section in this report, do verify first whether it is or not problematic :).
Here's the script: eval-report
, cc @danbst @Mic92 who asked previously.
I've cherry-picked haskell updates from master, that should fix stack2nix and others, but might break new things.
https://github.com/NixOS/nixpkgs/pull/46005 is to be backported to release-18.09
Please backport #46367.
@volth @yesbox Done. Thanks!
Eval of hydra
package currently fails on 18.09 @ f829dbfaaa5 (https://hydra.nixos.org/eval/1478932)
$ nix-build . -A hydra
error: attribute 'stm_2_4_5_0' missing, at ...
@domenkozar could that be related to the haskell updates you mentioned?
@xeji Could you link to the failing build? As far as I know Hydra has yet to pick up any Haskell dependencies, and it looks like it builds on all but i686 to me.
Oh, darcs... On it.
staging-18.09
is progressing very slowly on hydra due to the jobset's low priority.To get these fixes merged before release date, the priority should be raised. @vcunat @samueldr
Yes - currently staging-18.09 has only 500 shares while staging has 1000 shares. Probably should bump staging-18.09 to like 1500
I bumped it to 5000 (still a fraction of "trunk" or releases), but I also canceled the pointless eval running – it was between a commit and its revert. This means we actually should _already have_ all rebuilds ready for the tip of staging-18.09.
EDIT: well, not exactly, as release-18.09 hasn't been merged to staging-18.09 for a long time, so there is lots of rebuilds created just by merging them (I estimated ~8k per platform).
The fix for zathura is already in staging and staging-18.09: #46376.
(edit: this was in reply to a comment that was apparently deleted by its author while I wrote it)
@xeji: yes, thank you. I noticed some 30s later that it was already fixed in staging.
Generated a fresh report:
It's not based on the currently runnning latest evals for darwin or aarch64, but the completed penultimate ones.
Hmm, I can't reproduce ponyc
fail locally, also the hydra build log looks incomplete with no errors there.
@kamilchm: apparently Hydra already has retried them, possibly in some other jobset or due to some dependency, which is why the log showed only a successful build. I "restarted" it to make it green. _Unfortunately we don't support multiple logs per *.drv file._
Uh, on this day, aarch64 has fewer failures than darwin. Quite the reversal!
Though, it looks like a good deal of the new failures come from texlive.
Oh no, bus errors on my t2m
machine again :-/ That's just transient. I suppose @copumpkin still doesn't know why it happens...
@globin has pushed another texlive
fix to staging-18.09
: 69da311f79f , see discussion in #46376.
Since this triggers another mass rebuild via gtk-doc
anyway, I have also picked #46761 to staging-18.09
(3d949911c05b9ae317dc0d345e97e30c4c5e80d6) so future texlive
changes don't cause mass rebuilds anymore.
Unfortunately, this doesn't fix zathura
yet.
Looks like the mac build problems are fixed. Still a surprising 595 failures, against 583 on aarch64! I wouldn't be surprised if there were other transient failures, or something huge and easy to fix on Darwin.
I don't expect large transient failures on mac anymore – those would probably be visible as jumps on large rebuilds. Looking at Hydra numbers, during this ZHF, aarch64 has seen lots of improvements, whereas darwin didn't improve much. (can't say _why_)
A lot of what's broken on darwin, 340 jobs + new ones, have been for a long time or forever. I've not had the time/energy to go though those and mark them linux only.
@LnL7 maybe you can define clear criteria like "if it's been broken on darwin for >x months it should be marked linux only?" and open a WIP PR so all committers could add to it whenever they have time?
I linked to an overview of the jobs that have been broken for over a year on the darwin ZHF issue, the rest is kind of a grey zone. I started to go over those marking them broken or linux only where appropriate, like https://github.com/NixOS/nixpkgs/pull/46584 and https://github.com/NixOS/nixpkgs/pull/46628, but I'm kind of busy with other things now.
staging-18.09 looks pretty good on Hydra, but there are some timed out gnome3
jobs that need restarting.
@vcunat @samueldr
OK, merged. The cheese error looked like a parallel-make problem, but I can't see it happen often, so I simply restarted it along with the rest of new failures.
Could someone please re-trigger julia_10.x86_64-linux
, julia_07.x86_64-linux
? These two should build successfully.
The leancheck issue has been filed upstream
sage
is still broken on 18.09 (but not on master). ping @timokau
@xeji thanks for the heads up. I cannot reproduce a failure with the most recent release-18.09, so that failure is probably outdated.
@timokau: the newest evaluation failed three times (on different Hydra machines): https://hydra.nixos.org/build/81780926 There's always some "N doctests failed" at the end.
you lack a implementation (library, egg) where fib is defined, see the error
NameError: name 'fib' is not defined
**********************************************************************
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/repl/ipython_extension.py", line 403, in sage.repl.ipython_extension.SageMagics.fortran
Failed example:
fib
Exception raised:
Traceback (most recent call last):
File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 573, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 983, in compile_and_execute
exec(compiled, globs)
File "<doctest sage.repl.ipython_extension.SageMagics.fortran[3]>", line 1, in <module>
fib
NameError: name 'fib' is not defined
**********************************************************************
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/repl/ipython_extension.py", line 407, in sage.repl.ipython_extension.SageMagics.fortran
Failed example:
fib(a, 10)
Exception raised:
Traceback (most recent call last):
File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 573, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 983, in compile_and_execute
exec(compiled, globs)
File "<doctest sage.repl.ipython_extension.SageMagics.fortran[6]>", line 1, in <module>
fib(a, Integer(10))
NameError: name 'fib' is not defined
**********************************************************************
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/repl/ipython_extension.py", line 408, in sage.repl.ipython_extension.SageMagics.fortran
Failed example:
a
Expected:
array([ 0., 1., 1., 2., 3., 5., 8., 13., 21., 34.])
Got:
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
**********************************************************************
@vcunat I still cannot reproduce that locally. The main error seems to be
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/misc/inline_fortran.py", line 65, in sage.misc.inline_fortran.InlineFortran.eval
Failed example:
fortran(code, globals())
Exception raised:
Traceback (most recent call last):
...
File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/misc/inline_fortran.py", line 125, in eval
raise RuntimeError("failed to compile Fortran code:\n" + log_string)
RuntimeError: failed to compile Fortran code:
...
File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/threading.py", line 736, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
With the rest being avalanche errors. Could that be some sort of hydra threading issue? Can anybody reproduce the error?
Hmm, last times I wanted to generate a report there weren't good evals to check. Now there is:
It looks like a chunk of the new failures on aarch64 would be fixed with #47564.
@vcunat any blockers for a release you think? I don't have anything in mind, since staging was merged and the last things I knew of were all on staging.
While not strictly ZHF related, the blockers don't seem to be severe enough to warrant blocking the release? Most are either older than this release cycle, or things that don't really block, but still would be great to finalize.
It sounds alright to me. Someone had mentioned merging #42846 before the release but since it's a nonessential module I would consider it nonblocking (although might be worthwhile backporting).
Make sure you look at the 18.09 milestone PRs though too: https://github.com/NixOS/nixpkgs/pulls?q=is%3Aopen+is%3Apr+milestone%3A18.09
I consider #47577 a blocker - apologies for not raising it earlier.
It looks like the sage build succeeded, curious.
As for the release I suggest removing all the blocker labels and bumping the milestones, then give the people involved at least one day to complain before going ahead with 18.09.
Hey, what do you know: I found a blocker, and I should have known better and known it is one beforehand:
See #47602, while this doesn't block on a technical side, this is a definite blocker on the human side, with a side dish of bad user experience as their first bite into NixOS.
In case anybody was on the edge of their seat because of the transient sage failure: Turns out it was caused by some numpy issue with high cpu count. Since I'm still seeing the issue in master, its probably still there in release-18.09 too. I've opened a PR (https://github.com/NixOS/nixpkgs/pull/49888) to cherry-pick the fix to numpy in release-18.09. For master it should be good enough to wait for the next numpy upgrade.
Looks like we had a huge bump at https://hydra.nixos.org/eval/1495549
Many are propagated from https://hydra.nixos.org/build/85961701 which failed with "Log limit exceeded" and "building of '/nix/store/5awxqywjwjldazlzls4jslgm1l828hb3-nbd-3.18' killed after writing more than 67108864 bytes of log output" despite not writing 6MB to the log in the web UI. Possible hydra issue? @grahamc
Apparently someone has restarted these builds and they succeeded. (Yes, I'm a bit late.)
Most helpful comment
New report!
With more data!! Builds with dependency failures now present the dependencies that failed in their cell. You can easily gauge how much trouble a single package can fix. E.g. you can see the multiple perl
Module-Build-XSUtil
failures (per perl version) cause 103 failures on Darwin! I'm thinking as the next improvement that I should index those somehow so we can have a direct list of big hitting fixes.