Nixpkgs: Travis build fails more often than it doesn't

Created on 3 Mar 2015 · 34Comments · Source: NixOS/nixpkgs

I'm not sure who's responsible for it or whether it's progressing, but if it's not being actively developed/improved, can we just disable it?

Source

copumpkin

Most helpful comment

Now, that we have pr bots, can we disable travis? It confuses new people.

Mic92 on 13 Nov 2017

👍2

All 34 comments

Maybe Travis should be given a more lightweight task, like running nixpkgs lint and checking for evaluation errors.

spwhitt on 4 Mar 2015

I agree, travis.ci is really not the right tool, i think this should be implemented in hydra.

offlinehacker on 4 Mar 2015

👍1

Except Hydra isn't running any tests anymore as far as I can see. I might be wrong, but I think this coincides with Travis not checking successfully anymore.

In fact, looking at most of the latest travis checks this error seems to pop up rather often: GC Warning: Repeated allocation of very large block

devhell on 4 Mar 2015

The number of false-positives generated by those travis-ci jobs is so high that a "failure" outcome is next to meaningless, really. I would like to see those builds disabled, because their presence confuses contributors who spent their time worrying about bugs in their submission because those builds fail, when in fact it's just the build that's broken, not their PR.

peti on 4 Mar 2015

I benefit from travis, I merged many PR just by looking at the travis logs. For the contributor I just say ignore the build, and that's it. Not a big problem as you describe for me.

lethalman on 4 Mar 2015

Ping @madjar.

peti on 4 Mar 2015

As I suggested in my original question, I'm not necessarily opposed to leaving it in place if it's actively developed and there's a plan to improve it soon. But currently it generates so many false failures that I pretty much just ignore it, which means all its really doing is adding noise. Sure, a pass might be a good sign, but if they're rare and the system otherwise confuses new contributors, that doesn't seem good.

On Mar 4, 2015, at 08:22, lethalman [email protected] wrote:

I benefit from travis, I merged many PR just by looking at the outcome of travis.

—
Reply to this email directly or view it on GitHub.

copumpkin on 4 Mar 2015

Hello there

The travis job check that the tarball still build (nix-build pkgs/top-level/release.nix -A tarball), but this has been broken for a month. Also, on my local machine, this requires 2Go of RAM, so it fails on travis which is much more limited.

As a quick fix, I'll disable that check, but I think those two problems should also be fixed.

madjar on 4 Mar 2015

Ah, if it's a legitimate failure then whoever broke it should probably fix
it! My main issue is that I don't know when it's legitimate so I mostly
ignore it, and I expect that whoever broke it in the first place probably
did the same.

On Wednesday, March 4, 2015, Georges Dubus [email protected] wrote:

Hello there

The travis job check that the tarball still build (nix-build
pkgs/top-level/release.nix -A tarball), but this has been broken for a
month. Also, on my local machine, this requires 2Go of RAM, so it fails on
travis which is much more limited.

As a quick fix, I'll disable that check, but I think those two problems
should also be fixed.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nixpkgs/issues/6652#issuecomment-77164743.

copumpkin on 4 Mar 2015

Well, here's the two results I get from travis, both equally unhelpful and false positives from the same commmit: https://github.com/NixOS/nixpkgs/pull/10755

One is failing if I base my branch on nixos-unstable and PR against master: https://travis-ci.org/NixOS/nixpkgs/builds/88525959
The other is based on upstream/master and simply generates a log that's over 4MB, so it also fails (mostly because qt build logs are huge): https://travis-ci.org/NixOS/nixpkgs/builds/88522075

I really didn't expect contributing to become more difficult since the last time I worked with nixpkgs, which is maybe a year ago when tests were still based on hydra.

manveru on 31 Oct 2015

travis-ci is just a helper, not something we are strict about.

domenkozar on 31 Oct 2015

Lately it even seems to cause more harm than good. New people don't (immediately) realize what's going on and they spend time analyzing errors that have nothing to do with the PR. I just learned not to look at the errors at all, as I lately saw no added value in reading them.

EDIT: Does anyone (still) find it useful? (Perhaps I'm just mistaken.)

vcunat on 1 Nov 2015

IMHO, a test hook that generates about 10% false-positives is useless. I'd be perfectly happy if we'd just disable the travis-ci builds for Nipkgs.

peti on 1 Nov 2015

Every CI tool we use will have false positives until we provide a heuristic what changes we should build and what changes we want to skip because they're too "big".

There are still improvements that can be made to travis.

use nixos-channels as a base instead of nixpkgs repository
estimate build times using hydra information
make curl quiet
upload logs to S3 instead of using stdout http://docs.travis-ci.com/user/uploading-artifacts/
stop all services we don't need to get more memory

Regarding log size we can hardly do anything otherwise we'd lose all debugging output.

domenkozar on 2 Nov 2015

use nixos-channels as a base instead of nixpkgs repository

Maybe I don't understand you – do you mean it should (somehow) automatically rebase the PRs? (Currently it depends on the submitter where (s)he bases the PR on, I believe.)

vcunat on 2 Nov 2015

Yeah, it already re-bases on top of nixpkgs master.

domenkozar on 2 Nov 2015

I think we should collect statistics when travis-ci fails and under what conditions and then decide what can be solved or not.

domenkozar on 2 Nov 2015

Personally, I'd prefer to address this particular problem by holding to the agreement that master should not get any mass-rebuild changes (before binaries are available). Note: currently, that seems unfeasible for darwin due to Hydra being severely underpowered on that platform, but those mass rebuilds are often independent.

vcunat on 2 Nov 2015

Agreed, and that is something travis could tell us.

domenkozar on 2 Nov 2015

By looking at first two pages of PRs (50 PRs), I've noticed following issues with travis-ci:

1) 9 builds ran out of memory (example https://travis-ci.org/NixOS/nixpkgs/builds/88628057)
2) 2 builds stopped because of no output (example https://travis-ci.org/NixOS/nixpkgs/builds/88626974)
3) 4 builds failed on merge conflict (example https://travis-ci.org/NixOS/nixpkgs/builds/88626701)
4) 3 builds failed due to longer output than 4MB (example https://travis-ci.org/NixOS/nixpkgs/builds/88621524)
5) 6 builds successfully caught an error
6) 1 build failed because PR had no title (example https://travis-ci.org/NixOS/nixpkgs/builds/87937406)

The rest of the builds passed. Now, I'd say we can address all of these:

1) Fixed in 11b123662717ad20c3bb3100503786b35adfb158
2) This one is the hardest to solve reliably. More memory will help, but we should limit travis when changes are too big. I'd consider this one an positive, meaning it's mass-rebuild. See https://github.com/madjar/nox/issues/28
3) I haven't dig into merge conflict, but merge buttons seems to be green. Needs investigation.
4) Possible solutions:

slience curl
implement http://docs.travis-ci.com/user/uploading-artifacts/
or my most favourite, fix https://github.com/NixOS/nix/issues/443

5) YAY
6) Should be fixable in nox https://github.com/madjar/nox/issues/29

domenkozar on 2 Nov 2015

1) nice :-)
4) that's not the cause of failure, at least not in the example that you link – it failed due to OOM, and it doesn't display the full log due to being too long (you can view it raw https://s3.amazonaws.com/archive.travis-ci.org/jobs/88621525/log.txt).

vcunat on 2 Nov 2015

4) Yeah, bad example, see https://s3.amazonaws.com/archive.travis-ci.org/jobs/88522076/log.txt

domenkozar on 2 Nov 2015

I wonder if they provide an option to keep a tail of the log. 4 MB is way too much to keep for these purposes anyway.

vcunat on 2 Nov 2015

Actually, why not simply pipe the output through tail -c 100k or similar? Would anyone read a longer part of the log?

vcunat on 2 Nov 2015

7) And for completeness there is also https://github.com/madjar/nox/issues/22, but we just need a better way to detect changes even if recurseIntoAttrset is not used.

domenkozar on 2 Nov 2015

I did another round of testing. Here are the results for latest 50 PRs:

There are 14 (28%) false negatives we can fix:

2 fail due to merge conflicts (still needs investigation why they happen)
2 fail due to EOF using curl to download binaries (do we need a feature in Nix to retry?)
1 fail due to hydra saying 410 Gone (do we need to fix Nix to support that?)
9 fail due to max log size reached (see #13006)

There are 2 (4%) false negatives that will be hard to fix:

2 fail due to job timeout (not much we can do here)

There are 5 (10%) positives:

5 actual errors in code

domenkozar on 15 Feb 2016

This is getting worse and worse. I wonder if we should really just disable travis and work on Hydra that tests PRs.

domenkozar on 4 Sep 2016

For the PR's I've been looking at Travis appears to be working quite well now. The problem is that currently on master and I think also 16.09 the tarball cannot be build (https://github.com/NixOS/nixpkgs/issues/18209#issuecomment-244576612). A problem I did find is that Travis often reports a fail of the first test job for no apparent reason.

FRidh on 4 Sep 2016

On https://github.com/NixOS/nixpkgs/pull/18376, we resolved not to disable Travis yet. Seems like it's not pressing/can wait until we can use Hydra. Can we close this for now?

langston-barrett on 27 Sep 2016

I'm not convinced someone really is working actively on that Hydra support, but I'm hopeful.

vcunat on 28 Sep 2016

Okay, some notes on Travis issues from (https://travis-ci.org/NixOS/nixpkgs/pull_requests):

Nox-review shouldn't bite off more than it can chew (https://travis-ci.org/NixOS/nixpkgs/jobs/183271042)
- perhaps just don't build more than ~20 dependencies? (could cause false positives)
Changes to the stdenv is probably always going to cause headaches
When Hydra binary cache gets behind, Travis tries to build them
- maybe rebase on last fully built evaluation? (could break some PRs that depend on new stuff)
Travis sets hard limit of 4MB log size that we frequently go over.
- see NixOS/nix#443 for ideas
- nox-review could try to build with "-Q"

matthewbauer on 13 Dec 2016

👍1

Travis sets hard limit of 4MB log size that we frequently go over.

Filtering the output through tail might be a nice way.

vcunat on 13 Dec 2016

Update on this: Travis has been failing a lot lately on MacOS because of travis-ci/travis-ci#7628. I've opened travis-ci/travis-build#1104 to hopefully fix it.

matthewbauer on 10 Jul 2017

Now, that we have pr bots, can we disable travis? It confuses new people.

Mic92 on 13 Nov 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings