In the last 100 CI runs we only had 4 successful ones (https://nodejs-ci-health.mmarchini.me/#/job-summary). This is the worst ratio I can remember.
This is probably due to build issues and flaky tests together. I would just like to encourage everyone to have a look into some of these to get the CI results back to a level that we can better rely upon.
Most issues are around Windows (node-test-commit-windows-fanned) followed by arm (node-test-commit-arm-fanned) and containered builds (node-test-commit-linux-containered).
// cc @nodejs/collaborators
There seem to be a number of jobs where the subtasks are all green but the main task is red. Hopefully just some network problems with the Jenkins server that have resolved, or something like that, but if it recurs, we'll definitely want to enlist @nodejs/build folks to take a look.....
https://ci.nodejs.org/job/node-daily-master/ is also useful as evidence of build stability: its had one green in the last 14 days, if master isn't passing tests in CI, that'll show up in PRs.
And fwiw, @AshCripps has been hacking a bit on @joyeecheung 's ncu-ci utility, to add support for reporting on master. There are always going to be some failures on PRs, so sorting out related-to-PR failures and unrelated might require human judgement, but master build failures are instability in the tests or CI infrastructure pretty much by definition.
https://github.com/openjs-foundation/summit/issues/213#issuecomment-562795510 --- Perhaps worth having a community corner discussion on this? Though I see there's been a bunch of test fixes in the last days.
Just opened https://github.com/nodejs/node/pull/30848 to mark a few tests as flaky, so we should see less red and more yellow. Getting to see more green is the next step, but it's not as simple so will take longer.
Just a heads up: recently a few flakes where looked at and we are back to 12 passing from 100. That is significantly better. I'll close this as soon as we reach 20 from 100. We should probably try to reach at least 25 passing though.
Thanks a lot to everyone who looked into fixing some of these flaky tests by the way!
The CI is back to a "normal" level of flakes.
Most helpful comment
And fwiw, @AshCripps has been hacking a bit on @joyeecheung 's
ncu-ciutility, to add support for reporting on master. There are always going to be some failures on PRs, so sorting out related-to-PR failures and unrelated might require human judgement, but master build failures are instability in the tests or CI infrastructure pretty much by definition.