Nixpkgs: Run check phase by default

Created on 8 Jan 2018  Â·  17Comments  Â·  Source: NixOS/nixpkgs

Not sure what other distributions do, but skipping check phases by default seems to me to be slightly unseemly to me. I rather be assuming upstream tests are valid/useful than not.

Also, this makes a better solution for #30437. We can make the default doCheck ? hostPlatform == buildPlatform so that one someone rights doCheck = true; they indeed test in all cases as requested, but when they leave out doCheck altogether, only native builds get their tests run.

CC @grahamc @vcunat @globin @fpletz @dezgeg

Most helpful comment

My implementation breaks nothing, it just makes broken things explicit. If you look at #39463, then, firstly, it's enormous, and, secondly, there are at least a couple cases where there's clearly a bug in nixpkgs expression that the testing found (but I wasn't able to fix it, things that I did fix will go into a separate PR).

I wanted to implement this for a while, then I saw this issue and @edolstra's comment several months ago and though to myself "well, maybe @edolstra's right, but what if its not 'massive', surely, nobody actually tried" and decided to bite the bullet and actually run the experiment. And running the experiment turned out to be useful. Firstly, I fixed a bunch of bugs, and secondly the blowup turned out not to be really "massive", it's less that 2x for my system, which hurts, I agree, but not "massively" :)

My implementation just adds an option config.doCheckByDefault which I would now keep set when I'm not doing experiments that require long sequences of mass rebuilds.

All 17 comments

👎 on this. Checks are typically very unreliable and non-deterministic. So expect massive breakage and perpetually broken channels if we enable checks by default. Also, of course, build times will increase massively.

@Ericson2314 check out #39463.

In theory, this sounds good but in practice I agree that this will break a lot of things. Especially because most devs intend "make check" to only be run by them in their own environments. Maybe it would be nice to experiment with as an overlay/jobset or something.

If you think about it, each dependency listed in Nixpkgs is weird kind of integration test in its own right (if its actually used of course). I would be interested in seeing whether big distros like Debian or Arch do testing in an automatic way at all. I would bet our build farm runs more tests right now than most (of course a large userbase is the ultimate integration test).

My implementation breaks nothing, it just makes broken things explicit. If you look at #39463, then, firstly, it's enormous, and, secondly, there are at least a couple cases where there's clearly a bug in nixpkgs expression that the testing found (but I wasn't able to fix it, things that I did fix will go into a separate PR).

I wanted to implement this for a while, then I saw this issue and @edolstra's comment several months ago and though to myself "well, maybe @edolstra's right, but what if its not 'massive', surely, nobody actually tried" and decided to bite the bullet and actually run the experiment. And running the experiment turned out to be useful. Firstly, I fixed a bunch of bugs, and secondly the blowup turned out not to be really "massive", it's less that 2x for my system, which hurts, I agree, but not "massively" :)

My implementation just adds an option config.doCheckByDefault which I would now keep set when I'm not doing experiments that require long sequences of mass rebuilds.

I like this reasoning… I'm +1 on this…

On Wed, Apr 25, 2018, 6:47 AM Jan Malakhovski notifications@github.com
wrote:

My implementation breaks nothing, it just makes broken things explicit. If
you look at #39463 https://github.com/NixOS/nixpkgs/pull/39463, then,
firstly, it's enormous, and, secondly, there are at least a couple cases
where there's clearly a bug in nixpkgs expression that the testing found
(but I wasn't able to fix it, things that I did fix will go into a separate
PR).

I wanted to implement this for a while, then I saw this issue and
@edolstra https://github.com/edolstra's comment several months ago and
though to myself "well, maybe @edolstra https://github.com/edolstra's
right, but what if its not 'massive', surely, nobody actually tried" and
decided to bite the bullet and actually run the experiment. And running the
experiment turned out to be useful. Firstly, I fixed a bunch of bugs, and
secondly the blowup turned out not to be really "massive", it's less that
2x for my system, which hurts, I agree, but not "massively" :)

My implementation just adds an option config.doCheckByDefault which I
would now keep set when I'm not doing experiments that require long
sequences of mass rebuilds.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/33599#issuecomment-384159541,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADWlsRxAdpu02YRr2jKBOMSO--QNpOoks5tr__fgaJpZM4RV8JQ
.

This is great! Do we have info on how many packages have failing tests? Could we discuss enabling this by default?

Unfortunately something we'll run into that I haven't seen addressed in this issue is flakey tests. Sometimes they work, sometimes they don't. Certainly mitigated by the lack of networking but will probably happen anyway.

Well, I have several more hundred patches that disable and fix those..., and I have not successfully built all of release.nix with this enabled yet, but you can grep doCheck = false to get an approximation on the current master (I build most of the basic stuff and most of non-qt desktop packages regularly with doCheckByDefault enabled).

My motivation for doing this was not research into statistics (which could be an interesting paper, I agree) but the fact that SLNOS overrides quite a lot of things (e.g. libcardiacarrest instead of libpulse) and debugging test failures helped to debug those (e.g. libcardiacarrest became absolutely perfect, IMO).

Actually, in my experience, the fact that tests "blink" between package versions is more of a problem. I added some more infra to my nixpkgs to check that derivations with doCheck = false actually fail tests and got somewhat depressed with the result. What failed before stops failing (which is good), what didn't fail before starts failing (which is depressing), and these changes are not monotonic between version updates.

With infra currently in master the result of enabling doCheckByDefault would be that the number of doCheck = false lines will grow without bounds until most packages will get them. That would be unfortunate.

In general, IMO, the main problem is that doCheck is just one bit of information. In my experience, that just doesn't cut it.

  • I now think that doCheckCross should be a separate attribute (false by default) as I have yet to see a package where cross check succeed, and comparing build and host systems everywhere (e.g. when you need to make doCheck conditional on something, e.g. for bootstrap) is really ugly.

  • Also, IMO doCheck = false should be split into doCheck = false (meaning "you can try, but running them will fail") and doCheck = "never" (or whatever, meaning "don't ever try to run those", e.g. see the expression for numactl, or packages that can't properly install after a failed make check (yes, the reality is just absolutely horrible.)).

@oxij You're saying the tests for a package will fail for a single version, then pass again on the next version? Do you happen to have an example?

Yes, and then fail again for the next version, and so on. An example from the top of my head is GHC. The problem with GHC is that different versions seem to be packed by different people or something, and while some packagers include tests in the tarball, others don't, but don't stub out them in the Makefile either.

I might be missing something, but I added config.doCheckByDefault = true; to my /etc/nixos/configuration.nix and nix-build -A haskell.compiler.ghc863 doesn't rebuild and test but pulls a binary from the cache. If I add doCheck = true; to the derivation then it does a build and test. Is this expected?

For nix-build you need to add doCheckByDefault into ~/.config/nixpkgs/config.nix (or similar, see pkgs/top-level/impure.nix).

Based on https://github.com/NixOS/nixpkgs/issues/55425 and https://github.com/NixOS/nix/issues/2691, can we also get a way to selectively disable the checkPhase without changing the derivation/output hash? I have use cases where I sometimes want to run tests, and someones I don't want to run tests via nix-build (and this is sometimes due to whether I'm running on a machine with GPU or not).

This would only be possible with something like @Profpatsch's withTests (not yet merged), and effectively only for install-time tests.
Regular (between build and install) tests can change the build result, and I imagine it would be hard to sandbox them without breaking a lot of tests.

Are there any examples of make check actually changing the build output? It's definitely possible but I would suspect very rare. The check phase happens before the install so I find it very unlikely that it would mess with the install. The installCheckPhase is a different story though. We could try a few tricks like making $out read only for the duration of checkPhase and installCheckPhase.

We could try a few tricks like making $out read only for the duration of checkPhase and installCheckPhase.

That will work only for installCheckPhase, checkPhase runs before installPhase so it could influence it anyway.

We can mount unionfs or whatever on top of /build for the duration of checkPhase, though. But, ideally, this should be a Nix feature, i.e. checkpoint and rollback while doing the build.

like @Profpatsch's withTests (not yet merged),

I hear you

(triage) Closing this issue, as the amount of controversy around it make it sound like it would need an RFC to get people to agree on what should be done anyway.

Was this page helpful?
0 / 5 - 0 ratings