Test-infra: Less e-mail spam from e2e test failures

Created on 26 Jan 2017  Â·  49Comments  Â·  Source: kubernetes/test-infra

At the moment if a PR doesn't build or has some other major problem it will fail all of the e2e tests. The bot will create a new comment for every single failure. That new comment will have the summarization of all test failures for the commit, but it still creates a new comment and deletes the old. This means I get ~8 emails. Even though the github history looks good.

Personally I'd rather get 1 e-mail for each pull that failed. And have that first comment edited with all the info. If a new push is bad the first failure should create a new comment (delete any old comments) and will generate an e-mail. If a following test fails the bot should update that comment rather than create new/delete.

Doing so would cut way down on e-mail spam. Admittedly it would lose some e-mail fidelity as you wouldn't get all test failures in your e-mail, only the first. But it seems like a good tradeoff to me.

1 email per push.

areprow kinbug lifecyclrotten prioritimportant-longterm sicontributor-experience ¯\_(ツ)¯

All 49 comments

@spxtr

I think it was @thockin who likes the email spam. Maybe he can weigh in? Personally, I've just turned off all emails from GitHub since it is way too noisy.

I can live with that. Maybe include "this comment may be updated if other tests fail" ?

+1 for new comment on aggregate state change, updating comment otherwise

So what I am seeing now is that a PR is updated with over TWENTY new
comments just from teh test bot. Can we go back to just editing the
comment? :)

On Fri, Jan 27, 2017 at 6:19 PM, Jordan Liggitt notifications@github.com
wrote:

+1 for new comment on aggregate state change, updating comment otherwise

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/1723#issuecomment-275820385,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEJ7wzRTqRTFbV-tQHfZCOE6yQzwks5rWqWcgaJpZM4LvDyR
.

Yep, I can do that. The reason you're seeing extra emails is because it removes the entry when the test passes.

Another idea, I can also make it only send an email if a new failure shows up, and edit the comment to remove passing tests.

I admit, I'm still big on one e-mail per push...

OK, right now it does one email per failure, the same behavior as before. I did it that way just because it was an easy improvement, but we can still make it approximate one email per push.

If the emails restarted on a new push, I would be OK . As it is, a WIP PR
has something like 75 bot comments.

On Feb 7, 2017 10:10 AM, "Joe Finney" notifications@github.com wrote:

OK, right now it does one email per failure, the same behavior as before.
I did it that way just because it was an easy improvement, but we can still
make it approximate one email per push.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/1723#issuecomment-278089002,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVJRGPkpogsWtHmLNDoq4MHU89DVIks5raLOEgaJpZM4LvDyR
.

See issue #1962 by @resouer.

Yes, my two cents is maybe we can just send test failure emails to the PR author, not every participant. :)

I like the #1962 format

/assign @spxtr

Are we done/happy here for now?

I improved it, but I think we should leave this open for now. We still send an email for every failure, not for every commit with a failure.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

/remove-lifecycle stale
/lifecycle frozen

/remove-lifecycle frozen
/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

/remove-lifecycle rotten

@fejta I don't quite understand what should be changed in report.go do you want to keep on creating/updating/removing report on failure, and get an aggregated error at the end of Report()?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

cc @krzyzacy what about doing this with the new reporter tooling? less email spam sounds amazing :^)

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

cc @neolit123 who just asked today
/pony spam

@krzyzacy: Couldn't find a pony matching that query.

In response to this:

cc @neolit123 who just asked today
/pony spam

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/help
/unassign @spxtr

i can take a look at this.

/assign
/remove-help
/priority important-longterm

My $0.02 here:

Currently we call into reportlib from places like: plank, jenkins-operator, external-plugins/refresh/server.go and crier, I'm working on move it out of plank to crier, unclear if I want to handle other ones in the same time yet.

I've implemented the aggregate report behavior for gerrit in https://github.com/kubernetes/test-infra/blob/master/prow/gerrit/reporter/reporter.go#L100 but that's pretty bespoke, since we don't allow retest in gerrit, I think on top of that we also need to figure out the most recent job per PR.

Also we probably want to chat with contribex per https://github.com/kubernetes/community/blob/master/sig-testing/charter.md#cross-cutting-and-externally-facing-processes

gathering bits from some old comments in this threads and from our discussion today.

i think this is the culprit of the problem:
https://github.com/kubernetes/test-infra/issues/1723#issuecomment-292085934

We still send an email for every failure, not for every commit with a failure.

from my limited understanding of the infrastructure i can think of a solution how to handle this with a generic GitHub bot client - here is what i think can be done:

  • when there is an initial failure make the bot write a comment (with the table, as it is doing now).
  • when a new failure happens verify if it's against the same commit (e.g. the commit for a failure is already known / written in the table):
    -- if yes, update the old comment.
    -- if not, delete the old comment and create a new one.

would that work?

one potenial problem with the above:
if a discussion happens it can push the bot table report up the history. if that's the case we can delete the old comment and create a new one too.

we can also look for /retest and test comments after the bot report and always delete-old/create-new comment in such a case.

but honestly, the PR status at the bottom is also a pretty good source of truth. it turns orange / red if something fails. the bottom status and the table are duplicate GUIs, in a way.

Also we probably want to chat with contribex per https://github.com/kubernetes/community/blob/master/sig-testing/charter.md#cross-cutting-and-externally-facing-processes

cc @kubernetes/sig-contributor-experience-bugs
cc @tpepper

One issue with the current table is that when I push a new commit, it will still show me failures from the old commit while those jobs are still running.

Ideally when we detect a new commit we clear the table and only show failures for that commit.

See https://github.com/kubernetes/test-infra/issues/8667

The fix for this was reverted due to lack of consensus.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

Any chance we can re-introduce the "skip updating comments when a ProwJob is for an old commit." part of #11341 together with a feature gate so users other than prow.k8s.io can disable this weirdness?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

said irony fejta-bot :(

On Sun, May 31, 2020 at 4:25 AM Kubernetes Prow Robot <
[email protected]> wrote:

Closed #1723 https://github.com/kubernetes/test-infra/issues/1723.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/1723#event-3390521528,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK76E753T5VI5GTOH2TRUI5BNANCNFSM4C54HSIQ
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cjwagner picture cjwagner  Â·  3Comments

zacharysarah picture zacharysarah  Â·  3Comments

spiffxp picture spiffxp  Â·  3Comments

stevekuznetsov picture stevekuznetsov  Â·  3Comments

BenTheElder picture BenTheElder  Â·  3Comments