Drone and docker allow for a number of settings.
We can limit memory, cpu quota, cpus and a bunch of different settings, see here: https://docs.docker.com/compose/compose-file/#resources
Today, while under heavy load, the CI started timing out (it's never done that before). So we should investigate limiting each job's resources.
Currently we have unlimited CPUs, memory etc for the jobs. But we limit the amount of jobs to 20, where each commit uses 6 jobs.
We could do something like:
Once we get rid off partest, we will be able to run jobs for 8 commits at once, currently this will be far less since each job has 6 individual jobs vs 2 after killing partest.
From inside EPFL, you can access this page: http://tresormon.epfl.ch/munin/epfl.ch/lampsrv9.epfl.ch/cpu.html
We could also make the dotty-bot kill jobs that are no longer necessary. Such as jobs that are not part of a PR anymore, or jobs on earlier commit if the contributor pushes prolifically.
ping @lampepfl/dotty-team
Instead of limitting ammount of resources given to each job, would it be better to only have 1 job running at a time, but give it all the resources to make sure it finishes fast?
This will only work if we additionally limit how much time can a singe test run, as otherwise a signle dead-locked test will make CI wait for it to complete.
This might also be a good idea, especially when our tests only consist of:
which will be reality after partest is removed. Then the entire test suite is able to take advantage of full capacity of the server, if these are run sequentially.
Otherwise I believe that the two tests will compete for resources.
Another consideration is that this will probably only be superior once we have the separate VM run tests.
Drone also dies out if there are no output within 5 minutes -- for that case, we can either add some more noise to the output or adjust the duration (if possible).
@liufengyun - I've upped this to 10 minutes and changed the new parallel test-suite to output for each compilation. So, that should never happenâ„¢
@felixmulder We've been way underusing the CI in the past two weeks: http://tresormon.epfl.ch/munin/epfl.ch/lampsrv9.epfl.ch/cpu.html. We should be able to almost double the number of jobs running in parallel.
@smarter - I'm not sure this week has been representative of our usual activity. I'd up from 4 to X, and then see if we're hitting ~80%. The sliding window for X could be made to run over 3 jobs instead of 2 like now.
WDYT?
Sure.
This issue looks outdated. Should we close it? @allanrenucci WDYT
We now limit the amount of jobs to 3 (each build uses 1 job). Building a single PR takes about 15min. With 3 jobs running at the same time, it takes about 30min. It might still be worth experimenting with ressource sharing to see if we can run more job in parallel and keep the runtime low.
However, I am not sure we can configure job ressources: https://discourse.drone.io/t/configure-cpu-and-memory-for-agents/1222
Do we really need this as an issue? Personally I would close it. It looks like a never-ending task that we will always need to perform with new changes in the CI tests.
Sure, we can close