Test-infra: Less e-mail spam from e2e test failures

Created on 26 Jan 2017 · 49Comments · Source: kubernetes/test-infra

At the moment if a PR doesn't build or has some other major problem it will fail all of the e2e tests. The bot will create a new comment for every single failure. That new comment will have the summarization of all test failures for the commit, but it still creates a new comment and deletes the old. This means I get ~8 emails. Even though the github history looks good.

Personally I'd rather get 1 e-mail for each pull that failed. And have that first comment edited with all the info. If a new push is bad the first failure should create a new comment (delete any old comments) and will generate an e-mail. If a following test fails the bot should update that comment rather than create new/delete.

Doing so would cut way down on e-mail spam. Admittedly it would lose some e-mail fidelity as you wouldn't get all test failures in your e-mail, only the first. But it seems like a good tradeoff to me.

1 email per push.

areprow kinbug lifecyclrotten prioritimportant-longterm sicontributor-experience ¯\_(ツ)¯

Source

eparis

👍2

All 49 comments

@spxtr

eparis on 26 Jan 2017

I think it was @thockin who likes the email spam. Maybe he can weigh in? Personally, I've just turned off all emails from GitHub since it is way too noisy.

spxtr on 26 Jan 2017

I can live with that. Maybe include "this comment may be updated if other tests fail" ?

thockin on 27 Jan 2017

🎉1

+1 for new comment on aggregate state change, updating comment otherwise

liggitt on 28 Jan 2017

So what I am seeing now is that a PR is updated with over TWENTY new
comments just from teh test bot. Can we go back to just editing the
comment? :)

On Fri, Jan 27, 2017 at 6:19 PM, Jordan Liggitt notifications@github.com
wrote:

+1 for new comment on aggregate state change, updating comment otherwise

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/1723#issuecomment-275820385,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEJ7wzRTqRTFbV-tQHfZCOE6yQzwks5rWqWcgaJpZM4LvDyR
.

thockin on 6 Feb 2017

Yep, I can do that. The reason you're seeing extra emails is because it removes the entry when the test passes.

spxtr on 6 Feb 2017

Another idea, I can also make it only send an email if a new failure shows up, and edit the comment to remove passing tests.

spxtr on 6 Feb 2017

I admit, I'm still big on one e-mail per push...

eparis on 7 Feb 2017

OK, right now it does one email per failure, the same behavior as before. I did it that way just because it was an easy improvement, but we can still make it approximate one email per push.

spxtr on 7 Feb 2017

If the emails restarted on a new push, I would be OK . As it is, a WIP PR
has something like 75 bot comments.

On Feb 7, 2017 10:10 AM, "Joe Finney" notifications@github.com wrote:

OK, right now it does one email per failure, the same behavior as before.
I did it that way just because it was an easy improvement, but we can still
make it approximate one email per push.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/1723#issuecomment-278089002,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVJRGPkpogsWtHmLNDoq4MHU89DVIks5raLOEgaJpZM4LvDyR
.

thockin on 7 Feb 2017

See issue #1962 by @resouer.

spxtr on 22 Feb 2017

Yes, my two cents is maybe we can just send test failure emails to the PR author, not every participant. :)

resouer on 23 Feb 2017

👍1

I like the #1962 format

fejta on 9 Mar 2017

/assign @spxtr

Are we done/happy here for now?

fejta on 6 Apr 2017

I improved it, but I think we should leave this open for now. We still send an email for every failure, not for every commit with a failure.

spxtr on 6 Apr 2017

Requires modifying https://github.com/kubernetes/test-infra/blob/547f449510b61b012485c4d58070aafab00a86ce/prow/report/report.go#L82

fejta on 30 Sep 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot on 6 Jan 2018

/remove-lifecycle stale
/lifecycle frozen

mindprince on 6 Jan 2018

/remove-lifecycle frozen
/remove-lifecycle stale

fejta on 25 Jan 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 25 Apr 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot on 25 May 2018

/remove-lifecycle rotten

nikhita on 13 Jun 2018

@fejta I don't quite understand what should be changed in report.go do you want to keep on creating/updating/removing report on failure, and get an aggregated error at the end of Report()?

matthyx on 17 Jul 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 15 Oct 2018

cc @krzyzacy what about doing this with the new reporter tooling? less email spam sounds amazing :^)

BenTheElder on 16 Oct 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 15 Nov 2018

/remove-lifecycle rotten

BenTheElder on 16 Nov 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 14 Feb 2019

/remove-lifecycle stale

BenTheElder on 14 Feb 2019

cc @neolit123 who just asked today
/pony spam

krzyzacy on 14 Feb 2019

👍1

@krzyzacy: Couldn't find a pony matching that query.

In response to this:

cc @neolit123 who just asked today
/pony spam

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 14 Feb 2019

/help
/unassign @spxtr

krzyzacy on 14 Feb 2019

i can take a look at this.

/assign
/remove-help
/priority important-longterm

neolit123 on 14 Feb 2019

My $0.02 here:

Currently we call into reportlib from places like: plank, jenkins-operator, external-plugins/refresh/server.go and crier, I'm working on move it out of plank to crier, unclear if I want to handle other ones in the same time yet.

I've implemented the aggregate report behavior for gerrit in https://github.com/kubernetes/test-infra/blob/master/prow/gerrit/reporter/reporter.go#L100 but that's pretty bespoke, since we don't allow retest in gerrit, I think on top of that we also need to figure out the most recent job per PR.

Also we probably want to chat with contribex per https://github.com/kubernetes/community/blob/master/sig-testing/charter.md#cross-cutting-and-externally-facing-processes

krzyzacy on 14 Feb 2019

gathering bits from some old comments in this threads and from our discussion today.

i think this is the culprit of the problem:
https://github.com/kubernetes/test-infra/issues/1723#issuecomment-292085934

We still send an email for every failure, not for every commit with a failure.

from my limited understanding of the infrastructure i can think of a solution how to handle this with a generic GitHub bot client - here is what i think can be done:

when there is an initial failure make the bot write a comment (with the table, as it is doing now).
when a new failure happens verify if it's against the same commit (e.g. the commit for a failure is already known / written in the table):
-- if yes, update the old comment.
-- if not, delete the old comment and create a new one.

would that work?

one potenial problem with the above:
if a discussion happens it can push the bot table report up the history. if that's the case we can delete the old comment and create a new one too.

we can also look for /retest and test comments after the bot report and always delete-old/create-new comment in such a case.

but honestly, the PR status at the bottom is also a pretty good source of truth. it turns orange / red if something fails. the bottom status and the table are duplicate GUIs, in a way.

Also we probably want to chat with contribex per https://github.com/kubernetes/community/blob/master/sig-testing/charter.md#cross-cutting-and-externally-facing-processes

cc @kubernetes/sig-contributor-experience-bugs
cc @tpepper

neolit123 on 14 Feb 2019

👍1

One issue with the current table is that when I push a new commit, it will still show me failures from the old commit while those jobs are still running.

Ideally when we detect a new commit we clear the table and only show failures for that commit.

See https://github.com/kubernetes/test-infra/issues/8667

fejta on 14 Feb 2019

The fix for this was reverted due to lack of consensus.

Katharine on 25 Feb 2019

😕1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 26 May 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 25 Jun 2019

/remove-lifecycle rotten

cblecker on 26 Jun 2019

Any chance we can re-introduce the "skip updating comments when a ProwJob is for an old commit." part of #11341 together with a feature gate so users other than prow.k8s.io can disable this weirdness?

alvaroaleman on 4 Sep 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 Dec 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 2 Jan 2020

/remove-lifecycle rotten

nikhita on 2 Jan 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 1 Apr 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 1 May 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 31 May 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot on 31 May 2020

said irony fejta-bot :(

On Sun, May 31, 2020 at 4:25 AM Kubernetes Prow Robot <
[email protected]> wrote:

Closed #1723 https://github.com/kubernetes/test-infra/issues/1723.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/1723#event-3390521528,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK76E753T5VI5GTOH2TRUI5BNANCNFSM4C54HSIQ
.

BenTheElder on 1 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

gke-ubuntustable1-k8sstable2-ingress misbehaving?

MrHohn · 4Comments

Cherrypick lables name not consistant

spzala · 4Comments

Label plugin parsing problem ("The label(s) kind/ cannot be applied")

benmoss · 3Comments

prow/blunderbuss: possibly use UserStatus?

cblecker · 4Comments

Pod (Utilities) racing

cjwagner · 3Comments