Test-infra: Prow issue: repeated bot updates

Created on 6 May 2019 · 9Comments · Source: kubernetes/test-infra

I got I think 200+ github notification emails (split into 3 gmail threads) containing prow updates for https://github.com/kubernetes/kubernetes/pull/77341, all at Sat, May 4, 12:43 AM. I'm not sure if it's a github problem or a prow problem. I didn't see this on any other PRs.

lifecyclrotten

Source

lavalamp

Most helpful comment

It would be nice if I had a way to automatically filter these notification messages, e.g. some unique string appeared that I could write a filter for.

this is a brilliant idea.

if we put specific fixed strings within different classes of prowbot comments then users could filter them out client side.

xref: https://github.com/kubernetes/community/issues/3621

BenTheElder on 6 May 2019

👍2

All 9 comments

What sort of updates were they?

stevekuznetsov on 6 May 2019

It looks like the PR was updated ~25 times with force-pushes at the time, which would have kicked off new rounds of testing each time and may have resulted in job results being posted back for those. If we can narrow down which actions the bot was taking at the time we might be able to see if there was some errant behavior on top of that.

stevekuznetsov on 6 May 2019

They were all seemingly the same, reporting all the tests had failed.

e.g.

Test name | Commit | Details | Rerun command
-- | -- | -- | --
pull-kubernetes-e2e-gce-csi-serial | 094de85 | link | /test pull-kubernetes-e2e-gce-csi-serial
pull-kubernetes-typecheck | 094de85 | link | /test pull-kubernetes-typecheck
pull-kubernetes-bazel-test | 094de85 | link | /test pull-kubernetes-bazel-test
pull-kubernetes-dependencies | 094de85 | link | /test pull-kubernetes-dependencies
pull-kubernetes-kubemark-e2e-gce-big | 094de85 | link | /test pull-kubernetes-kubemark-e2e-gce-big
pull-kubernetes-verify | 094de85 | link | /test pull-kubernetes-verify
pull-kubernetes-e2e-gce | e93bba7 | link | /test pull-kubernetes-e2e-gce
pull-kubernetes-e2e-gce-device-plugin-gpu | e93bba7 | link | /test pull-kubernetes-e2e-gce-device-plugin-gpu
pull-kubernetes-node-e2e | e93bba7 | link | /test pull-kubernetes-node-e2e
pull-kubernetes-e2e-gce-100-performance | e93bba7 | link | /test pull-kubernetes-e2e-gce-100-performance
pull-kubernetes-e2e-gce-storage-slow | e93bba7 | link | /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-integration | e93bba7 | link | /test pull-kubernetes-integration
pull-kubernetes-bazel-build | e93bba7 | link | /test pull-kubernetes-bazel-build

I do notice now that the hash is different on some of them. If we sent ~8 updates per force push, then 25 pushes could turn into 200+ emails. Maybe this is WAI?

It would be nice if I had a way to automatically filter these notification messages, e.g. some unique string appeared that I could write a filter for.

lavalamp on 6 May 2019

👍2

Yeah I think this is all of the jobs failing as they try to clone commits that no longer exist as they were force-pushed over, and this is a symptom of Prow trying to be very clear about test results happening.

I think it's yet another question about notification sending.

/cc @cjwagner @fejta @Katharine @BenTheElder

stevekuznetsov on 6 May 2019

It would be nice if I had a way to automatically filter these notification messages, e.g. some unique string appeared that I could write a filter for.

this is a brilliant idea.

if we put specific fixed strings within different classes of prowbot comments then users could filter them out client side.

xref: https://github.com/kubernetes/community/issues/3621

BenTheElder on 6 May 2019

👍2

FWIW this is one of my Gmail filters now:

Matches: from:(Kubernetes Prow Robot) The following test failed, say /retest to rerun them all:
Do this: Mark as read, Never mark it as important

BenTheElder on 16 May 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 14 Aug 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 13 Sep 2019

Closing in favour of kubernetes/community#3621