See e.g.
These are commits from PRs that passed all tests. The failure is always only on the Python 3.4 build, in different eval-test-* subtests. The "Actual" test output is always empty, which makes me wonder if the processes just die for some Travis-CI-specific reason. I see nothing at https://www.traviscistatus.com/.
I also saw a 3.6 failure: https://travis-ci.org/python/mypy/builds/242766320?utm_source=github_status&utm_medium=notification
Previously I had similar issues apparently caused by Travis CI killing processes when we had too many of them running in parallel. Restricting the maximum level of parallelism in Travis CI could help.
I also noticed that the parallelization is usually at 32 workers, but occasionally switches to 2 workers without any obvious reason (for just one or two builds). Perhaps Travis VM reports different number of cores to our test runner?
I believe we don't have access to sudo, otherwise, we could run
sudo free -m -t
sudo dmesg
We could try restricting the maximum level of parallelism in Travis CI to, say, 16.
According to the Travis docs, we get 2 cpu cores per container see here. Im not sure we should go over that, at least not too much.
EDIT: I just tested and using 2 cores leads to about a 40% increase in time spent per run. Im pretty sure we don't want that.
@ethanhs Yeah I noticed the same. Even a reduction from 32 to 16 resulted in a slight increase in runtime. Why would that happen given that we only have 2 cores?
I guess our tests have a decent amount of I/O wait (presumably disk?), and we unintentionally use our (very expensive, process-based) workers to deal with blocking I/O.
Obviously, the ideal solution would be to just use 2 processes instead of 32, but within them create either threads or (better) an asyncio loop to deal with I/O wait. But that would require:
So that ideal solution is no good.
Practically, I think we can just keep the number of workers low enough that the memory problems don't happen, and high enough that blocking on I/O is not a big performance hit.
Also, I just ran nproc which is a coreutils tool to determine the number of processes one can spawn (eg for my 4 core 8 thread i7 it says 8). Travis is saying 32.
I think investigating if your suspicion is correct would be very useful. One interesting thing I've noticed is that all of the failures are on the longest running container. If you think it is switching to 2 workers randomly, perhaps we can use nproc to help debug and see if that changes on failing builds?
Let's just decrease the maximum parallelism from 32 to 16 and see if that fixes the problem instead of doing anything more involved. It isn't very valuable to understand the root cause if it's specific to Travis CI and we can find a simple workaround. Slower tests are generally preferable to unreliable tests, in my opinion.
It doesn't seem like the decrease helped. Tests are still failing in typeshed CI, pretty consistently now.
I believe the issue there is that here mypy is run with the default concurrent processes, which on a travis worker is 32. If we lower that to 12, it should improve things (but probably won't solve them).
Thanks for finding that! Would you mind submitting a PR to typeshed to fix it?
Are we still seeing flakes on Travis-CI? Jukka and I discussed this off-line, and the best approach we can think of is to increase the test timeout from 30s to 5min. If it then goes away we can assume that was the issue. The timeouts rarely if ever caught anything real (most tests don't even run with a timeout).
Are we still seeing flakes on Travis-CI?
Yes this was hit in #4041 I believe. I'll make a PR to up the timeout to 5 minutes.