Mypy: Travis CI tests failing for 3.4 on master

Created on 14 Jun 2017 · 12Comments · Source: python/mypy

See e.g.

These are commits from PRs that passed all tests. The failure is always only on the Python 3.4 build, in different eval-test-* subtests. The "Actual" test output is always empty, which makes me wonder if the processes just die for some Travis-CI-specific reason. I see nothing at https://www.traviscistatus.com/.

Source

gvanrossum

All 12 comments

I also saw a 3.6 failure: https://travis-ci.org/python/mypy/builds/242766320?utm_source=github_status&utm_medium=notification

Previously I had similar issues apparently caused by Travis CI killing processes when we had too many of them running in parallel. Restricting the maximum level of parallelism in Travis CI could help.

JukkaL on 14 Jun 2017

I also noticed that the parallelization is usually at 32 workers, but occasionally switches to 2 workers without any obvious reason (for just one or two builds). Perhaps Travis VM reports different number of cores to our test runner?

I believe we don't have access to sudo, otherwise, we could run

sudo free -m -t
sudo dmesg

to get more diagnostics.

pkch on 26 Jun 2017

We could try restricting the maximum level of parallelism in Travis CI to, say, 16.

JukkaL on 26 Jun 2017

According to the Travis docs, we get 2 cpu cores per container see here. Im not sure we should go over that, at least not too much.

EDIT: I just tested and using 2 cores leads to about a 40% increase in time spent per run. Im pretty sure we don't want that.

ethanhs on 2 Jul 2017

@ethanhs Yeah I noticed the same. Even a reduction from 32 to 16 resulted in a slight increase in runtime. Why would that happen given that we only have 2 cores?

I guess our tests have a decent amount of I/O wait (presumably disk?), and we unintentionally use our (very expensive, process-based) workers to deal with blocking I/O.

Obviously, the ideal solution would be to just use 2 processes instead of 32, but within them create either threads or (better) an asyncio loop to deal with I/O wait. But that would require:

a verification that my guess is correct
a material rewrite of our test runner (which we plan to phase out)
giving up on pytest (which, through xdist, supports multiprocessing but has no plugins that support threads or async)

So that ideal solution is no good.

Practically, I think we can just keep the number of workers low enough that the memory problems don't happen, and high enough that blocking on I/O is not a big performance hit.

pkch on 3 Jul 2017

Also, I just ran nproc which is a coreutils tool to determine the number of processes one can spawn (eg for my 4 core 8 thread i7 it says 8). Travis is saying 32.

I think investigating if your suspicion is correct would be very useful. One interesting thing I've noticed is that all of the failures are on the longest running container. If you think it is switching to 2 workers randomly, perhaps we can use nproc to help debug and see if that changes on failing builds?

ethanhs on 3 Jul 2017

Let's just decrease the maximum parallelism from 32 to 16 and see if that fixes the problem instead of doing anything more involved. It isn't very valuable to understand the root cause if it's specific to Travis CI and we can find a simple workaround. Slower tests are generally preferable to unreliable tests, in my opinion.

JukkaL on 3 Jul 2017

👍1

It doesn't seem like the decrease helped. Tests are still failing in typeshed CI, pretty consistently now.

matthiaskramm on 20 Sep 2017

I believe the issue there is that here mypy is run with the default concurrent processes, which on a travis worker is 32. If we lower that to 12, it should improve things (but probably won't solve them).

ethanhs on 20 Sep 2017

👍1

Thanks for finding that! Would you mind submitting a PR to typeshed to fix it?

JelleZijlstra on 20 Sep 2017

👍1

Are we still seeing flakes on Travis-CI? Jukka and I discussed this off-line, and the best approach we can think of is to increase the test timeout from 30s to 5min. If it then goes away we can assume that was the issue. The timeouts rarely if ever caught anything real (most tests don't even run with a timeout).

gvanrossum on 2 Oct 2017