What would you like to be added:
A way to query and track how often retest and maybe some other triggers are called on k8s PRs.
Why is this needed:
As part of the effort to extend useful metrics around flake impact and shed some light on resource use caused by retest, it would be nice to have some tracking.
/assign
/sig contributor-experience
/cc @nikhita
@MushuEE Did you already have an approach in mind for this (I see that you assigned yourself)? This looks like it could be a good candidate for devstats as well - https://github.com/cncf/devstats - and should be easier to handle there.
cc @LappleApple @mrbobbytables
Hi @nikhita, I've added this issue to my DevStats notes. Currently have a next-step in mind around devstats and am discussing it with a few folks.
@LappleApple @nikhita I was looking at the devstats and reached out asking if we had those metrics already. I think Devstats is the best place for WebHook data. Keep me in the loop! I am working on flake reporting right now and would be happy to add whatever we need to start tracking retest metrics. Not sure how to do this yet however, need a little bit of time to dig deeper. @cjwagner, pointed me to devstats originally. I will start looking how to push metrics there. The List of requested metrics looks long.
would be happy to add whatever we need to start tracking retest metrics. Not sure how to do this yet however, need a little bit of time to dig deeper.
In the past, we've created an issue in the devstats repo asking for new dashboards/metrics and the maintainer has created them for us. @LappleApple has been representing kubernetes' feature requests and concerns to devstats these days, so I think she can probably point to next steps better. :)
Thank you @nikhita . Hi @MushuEE -- I've actually got a series of sessions on DevStats set up, first one was yesterday (three more scheduled). I'll send you the info (hopefully it's time zone-compatible) and you might consider joining us. :)
@LappleApple Thank you! Speaking with @BenTheElder it seems that the behavior of /retest might be quite a noisy metric and there might be some better metrics such as if tests change state after /retest is run on the same HASH. More direct flake indication. Apparently a lot of the /retests are triggered automatically
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Going to close this in favor of a future metrics effort
Most helpful comment
@MushuEE Did you already have an approach in mind for this (I see that you assigned yourself)? This looks like it could be a good candidate for devstats as well - https://github.com/cncf/devstats - and should be easier to handle there.
cc @LappleApple @mrbobbytables