Spack: Can we reduce Travis build times

Created on 12 Jan 2018  路  31Comments  路  Source: spack/spack

The tests *.8
TEST_SUITE=unit PYTHON_VERSION=2.7 COVERAGE=true
are not starting up.

travis

All 31 comments

Can you point me to a specific PR?

We recently added "Build Stages" that prevent the unit tests from running until after the flake8 tests pass. Also, macOS testing usually takes a while to run because there are a limited number of workers available. Linux tests may take a little while, but they're generally available immediately unless things are really busy.

You can look at any PR at least after #6910 (sorry I鈥檓 only on my phone)

They all look fine to me. #6910 and #6911 are both waiting for a macOS worker to be available. Travis has a ton of Linux machines but very few macOS machines, so there's usually a long line for those. It might take a few hours for one to become available. We decided to only run a single test on macOS to limit how much we use, but we still want to make sure things don't break on macOS because it's such a different system.

Oh okay thanks for the information

This is the longest backlog for macOS workers I've ever seen. Perhaps they are offline for security patches? https://blog.travis-ci.com/2018-01-08-travis-response-meltdown-spectre

@healther Just for info, you might want to check this. They had problems for a couple of days with MacOS, and recovered just yesterday. I guess they got quite behind the queue in the meanwhile.

Thanks @alalazo I started looking after @adamjstewart's post and found this: https://www.traviscistatus.com

Mh I poked a bit through the Travis logs. For the MacOSX the slowest test is:
213.29s call lib/spack/spack/test/cmd/url.py::test_url_list (total time 864s)
On Linux with py27 it takes only
118.99s call lib/spack/spack/test/cmd/url.py::test_url_list (total test time 594s)
But on Linux with py36 it is even less
30.87s call lib/spack/spack/test/cmd/url.py::test_url_list (total test time 433s)

Two questions:
1) why is it so much slower on the OSX instances? I don't have comparable hardware on Linux & OSX to test it myself
2) Would it be possible to switch the OSX test to py36? This should speed up the OSX testphase

The numbers are taken from https://travis-ci.org/spack/spack/builds/328144082?utm_source=github_status&utm_medium=notification (steps .3, .8 and .7 respectively) but they seem to hold for the other builds as well.

Should I reopen this issue or create a new one?

In my opinion there's no need for a new issue. What I have seen in the past is that tests time have an high variability on Travis, they may take e.g. 4 min. on one job and 13 min. on another. @adamjstewart What do you think?

@healther @adamjstewart I'll try to use this week #5842 on our internal release branch. In principle it should save a lot of time on package only PRs.

If I see no issues I'll merge the PR in spack/develop. Pinging @scheibelp @tgamblin and @becker33 to give them a chance to use their their veto power if they don't agree :smile:

I'm not sure that is the way to go. Yes we would reduce the test time by some 30ish minutes, but the main cost only comes from the MacOSX stuff. The Linux tests start more or less instantly and Travis is there to use the time to thoroughly test patches.

Unless I misunderstand test_url_list (which is the worst offender :tm:, with about 25% of the test time on OSX) it tests whether the url-generation of all packages works correctly. I.e. it should catch misformed url and url_for_version entries in package.py's. I profiled this function and it seems that is limited by the loop iterator, which is faster on python3.
So if we want to keep the coverage high (which I would definitely vote for) we should not deactivate these tests. The amount of issues that are opened because some package patch broke something is non-negligible right now. Although to be fair, this has more to do with incompatibilities to newer versions and/or compilers.

Irrespective of my particular contrived example: There needs to be at least a check that a checked in package.py has no syntax errors.

So if we want to keep the coverage high (which I would definitely vote for) we should not deactivate these tests.

Uh, I don't get what you mean. PRs with a branch name that starts with packages/ will skip unit tests and build tests on the PR. Commits to develop are always tested.

Maybe I'm fundamentally missing something... Please correct me if I'm wrong:

Currently any PR gets the whole test-suite (flake8, unit, build) for all python versions. Your change would reduce that to any PR that does not stem from a branch starting with packages/. The problem is, that things that are in package.py can break tests!

E.g. take

diff --git a/var/spack/repos/builtin/packages/bash-completion/package.py b/var/spack/repos/builtin/packages/bash-completion/package.py                                                                                            index 350cd4d..9d4006a 100644
--- a/var/spack/repos/builtin/packages/bash-completion/package.py
+++ b/var/spack/repos/builtin/packages/bash-completion/package.py
@@ -28,11 +28,11 @@ from spack import * 
 class BashCompletion(AutotoolsPackage): 
     """Programmable completion functions for bash."""
     homepage = "https://github.com/scop/bash-completion"
-    url = "https://github.com/scop/bash-completion/archive/2.3.tar.gz"
+    url = None

     version('2.7', 'f72c9e2e877d188c3159956a3496a450e7279b76')
     version('2.3', '67e50f5f3c804350b43f2b664c33dde811d24292') 
-    version('develop',  git='https://github.com/scop/bash-completion.git')
+    version('develop',  git='https://github.com/scop/bash-completion.git)

     # Build dependencies
     depends_on('automake', type='build')

this breaks spack test url, even if it only touches a package.py (and by that would not trigger spack test url). Yes this is a contrived example and yes we could work around this by introducing new tests that ensure that a package.py is "well formed". But I'm not sure this is worth it.

A syntax error won't pass flake8 tests, which are still run (as docs are). A thing like:

-    url = "https://github.com/scop/bash-completion/archive/2.3.tar.gz"
+    url = None

won't hopefully pass review. Besides, being practical, how many PRs like that did we have recently? I think none, but I may be wrong (in case you know of one, please point me to it).

Let's say that in the worst case we break some invariant and e.g. a package without url gets committed to develop. What will happen then? In my opinion just:

  1. Tests fail on develop
  2. Commit reverted or fix commited
  3. Problem solved

On the bright side we won't have a long queue of package only PRs that are waiting for a MacOS builder just for the sake of it.

Forgot to mention:

Yes we would reduce the test time by some 30ish minutes, but the main cost only comes from the MacOSX stuff.

The main cost comes from waiting, lately up to 24h or so, a MacOS builder. Even if you save the entire ~215sec. of the url tests according to your measurements above, what is the gain?

Finally, just for reference, here's an example PR.

won't hopefully pass review.

I hope so too, but the whole point of unit tests is to catch these kind of mistakes isn't it? I'm not sure whether flake8 will catch everything, though I'd take your word for it.

I'd like to prevent the case that

git clone https://github.com/spack/spack.git

will result in an unusable spack, which at least in the past it did and it took some time to convince these people to give spack another chance. Yes you are right in most cases it will be caught by other means but that isn't the idea behind unit test. It is explicitly not "we are smart enough to not let this happen" but explicitly "in case something stupid happens, this will at least give a basic check".

My idea was that reducing our load on the MacOS builder, the wait times wouldn't be as high. If this turns out not to be the case we could still only remove the MacOS tests. Which while not being perfect (I'm not sure where the subtle differences to Linux could end up being a problem) this would at least ensure a working state on Linux.

If this turns out not to be the case we could still only remove the MacOS tests.

This can be done, but tests will be fragmented into one more stage (the MacOS one) and stages build serially, not concurrently - we already have a ~20min. best case CI run, like that we are turning it into a ~35 min. best case. A little bit too much in my opinion. What we'll have as a counterpart is a false sense of security. Still waiting for cases where a package only PR broke the framework.

Finally, I know that I might be alone on that, but you must expect occasional breakage on develop (see #6929). It's by definition the bleeding-edge version of our repository. To have stability in production what I currently do is to branch off at some point, and carefully test that branch for our needs. New features won't be likely to get in our stable branch as they may introduce stability risks.

And just in case it was not clear: when a package only PR gets merged, everything is tested.

I was trying to reduce build time in #6370 as well, however it will only work the second time around.

~I think #6370 could be generally useful, besides QA.~ Sorry @junghans, I confused the PR with: #3761 :smile:

Okay maybe I don't get how the merging process goes. My thought was that once the tests on the PR are green, the merge itself is only a git merge-equivalent.

Are you saying that triggering a merge triggers a new set of tests? In that case :+1:

Finally, I know that I might be alone on that, but you must expect occasional breakage on develop

Sure there will always be things that there are no tests for (yet). However it is (imo understandably) hard to get people back that once had a bad experience. I generally prefer "just take the current HEAD" projects and especially for spack with its need to track all the different releases of packages you are practically forced to keep up to date. This makes it much more likely to hit such a bad HEAD...

I'm actually starting to agree with @healther on this one, simple things like newline characters in a docstring can break the doc tests, and a lot of things can go wrong in the package itself.

Is there a way we can skip macOS tests on branches but run them on develop? I don't think I've ever seen a PR that works on Linux but crashes on macOS. I'm sure one could occur, but it would be rare enough that a revert/patch would be sufficient.

Alternatively, is there a way we can make the macOS tests optional? Like, if they haven't finished yet, Travis still reports all green and we can merge, but they still have a chance to run eventually.

Alternatively alternatively, we could start collecting unit tests that test packages and only run those if it is a package-only PR.

@adamjstewart docs are always tested. So, to sum up:

  1. Package only PR
  2. Same commit merged to develop

@alalazo I'm on the fence. I think things will break more often than we expect. If Spack was more stable and we had frequent releases (allowing people to stick to a release version instead of tracking develop) this would be fine. I guess I would be fine with trying it out for a while and seeing how it goes.

Reporting a few numbers from travis status. Yesterday (15-01-2018) at 8h05 CET there were 568 jobs on the MacOS backlog, today (16-01-2018) at 7h50 they are 1661. Waiting time is currently ~24h for both our PRs and develop builds.

I have been monitoring closely the queues on Travis in the last couple of days. The delays due to waiting for MacOS builders are still more than 24h. I implemented a possible solution in our fork, and will report here in the few next days how it behaves. The modifications concerning qa are the following:

  • #5842 merged (and the effect is that a package only PR takes ~5 min. to complete the tests on Travis)
  • disabled MacOS tests on develop build, but run them once per day in a cron job (this has the effect of requiring only a single MacOS builder per day, while other commits get tested in ~35 min.)

Finally, I am trying to get commit status for PRs based only on linux and see if I can find other sensible ways of notifying a decrease in OSX coverage from the cron job. The idea is that I don't want a red cross on a PR because OSX coverage file was not uploaded. I still need to stress this last part to see if it works, but in any case it will affect only PRs that are modifying the framework.

There are still outages of Travis with more than 10m of no reply. I'm not sure whether this is the right place to continue the discussion @alalazo ?

@healther What I noticed yesterday is that, exclusively for python 2.7 unit tests, we still have intermittent failures when trying to concretize mpileaks - the build process doesn't emit output for more than 10 min. and then errors out.

@scheibelp noticed a maybe related issue some time ago, where a compiler unit test was failing only with python 2.7. The problem was there for a week or so, and now it's a while we don't see it anymore.

p.s. It's since early January that I am trying to maintain CI in a decent state, but keeping up with python ecosystem + Travis problems + our own bugs it's really difficult :smile:

I think @alalazo has made several improvements to the build process since this was opened, for example running the MacOS tests on a daily basis rather than for each PR (https://github.com/spack/spack/pull/6988, referenced in https://github.com/spack/spack/issues/6928#issuecomment-358619846). So I think this can be marked as resolved. Feel free to reopen if this comes up again.

@scheibelp please merge #6370

@scheibelp @tgamblin There's also #5842 that needs to be evaluated at some point.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amklinv picture amklinv  路  39Comments

adamjstewart picture adamjstewart  路  48Comments

citibeth picture citibeth  路  33Comments

DavidPoliakoff picture DavidPoliakoff  路  63Comments

adamjstewart picture adamjstewart  路  37Comments