We should keep an eye on coverage fluctuations in order to see if we want to add custom coverage targets, as well as if we want to not fail PRs or add notifications when code coverage drops below a certain amount. This would help us track coverage and flag less stable tests. It seems that with Coveralls, we posted coverage changes to Slack.
Currently, code coverage is only enabled for core GL, but this is something to keep in mind as enable code coverage for other platforms (tracked in #7609 for iOS). We can configure different coverage targets for different flags.
We have noticed that code coverage is occasionally dropping by 0.01% because a line in line_bucket.cpp is not being hit. This has popped up in several PRs, including https://github.com/mapbox/mapbox-gl-native/pull/13357, https://github.com/mapbox/mapbox-gl-native/pull/13352, abd https://github.com/mapbox/mapbox-gl-native/pull/13337.
@springmeyer is going to adjust the threshold for CodeCov failure until we can pinpoint what is causing this line to be skipped. Investigating on the ds-jk-line_bucket branch.
cc @kkaefer @friedbunny @zugaldia
We have noticed that code coverage is occasionally dropping by 0.01% because a line in line_bucket.cpp is not being hit. This has popped up in several PRs, including #13357, #13352, abd #13337.
Looks like there are many different places in the codebase that are flapping. So, fundamentally the unit tests are non-determinative, which is very concerning. It may be easily explained by how threading works or some other feature of GL. But worst case this means there is a possibility that our unit tests are not actually catching the bugs they intend to prevent from regressing and that there is a potential for latent bugs to be lurking, despite tests.
Based on voice with @jmkiley the next action we see on coverage tracking is to document all of the lines of code that we've seen flap in the coverage results in the last few days on this ticket. This will be good data to start getting a broad sense of scope of the non-deterministic test problem, which should be handed and prioritized separately from this coverage ticket.
@springmeyer is going to adjust the threshold for CodeCov failure until we can pinpoint what is causing this line to be skipped. Investigating on the ds-jk-line_bucket branch.
Yes, as a stopgap until the non-determinism is understood and solved I've put up #13364 in the hope that this will avoid confusing failures. But this is only a bandaid and not a real fix.
Tests that seem to have flapped in the last few days. The links below point to the lines CodeCov shows as changing.
@osana also experienced the weird flapping on https://github.com/mapbox/mapbox-gl-native/pull/13433
Seems that codecov not being reachable can also fail otherwise successful CI builds:
$ #!/bin/bash -eo pipefail
curl -sSfL -o codecov https://codecov.io/bash
chmod +x codecov
./codecov -c
curl: (22) The requested URL returned error: 503 Service Temporarily Unavailable
Exited with code 22
That makes sense given they just experienced a major outage: https://status.codecov.io/incidents/623n3mpxf6t1. But looks like it is resolved.
Once #13364 lands, the threshold for gl-native will be at 1%. We should keep an eye out for any PRs where code coverage unexpectedly changes by a greater amount, and reevaluate the threshold in 2019 Q2.
Most helpful comment
Looks like there are many different places in the codebase that are flapping. So, fundamentally the unit tests are non-determinative, which is very concerning. It may be easily explained by how threading works or some other feature of GL. But worst case this means there is a possibility that our unit tests are not actually catching the bugs they intend to prevent from regressing and that there is a potential for latent bugs to be lurking, despite tests.
Based on voice with @jmkiley the next action we see on coverage tracking is to document all of the lines of code that we've seen flap in the coverage results in the last few days on this ticket. This will be good data to start getting a broad sense of scope of the non-deterministic test problem, which should be handed and prioritized separately from this coverage ticket.
Yes, as a stopgap until the non-determinism is understood and solved I've put up #13364 in the hope that this will avoid confusing failures. But this is only a bandaid and not a real fix.