Test-infra: improve sig-testing testgrid alert configuration

Created on 20 Apr 2018  路  19Comments  路  Source: kubernetes/test-infra

we own a lot of jobs that don't really have good alerting but very much could.
we should set up a sig-testing alerts alias and configure our testgrid entries to send email alerts there with reasonable num_failures etc.

/area testgrid
/sig testing
/cc @spiffxp

aretestgrid lifecyclrotten prioritimportant-soon sitesting

Most helpful comment

Not true @k8s-ci-robot! I defy your power!

All 19 comments

similarly @AishSundar and I will work on GCP alerts, at least for conformance tests to ping GKE EngProd, and we should more broadly encourage subscribing to test failure alerts, especially for say, 10 tests in a row failing..
sig-owners really ought to have someone responding to these, or we should remove the tests

/assign
/asssign @spiffxp
/assign @AishSundar

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@spiffxp WDYT?
/remove-lifecycle stale

/milestone 1.12

/milestone v1.13
/unassign @AishSundar

I would like to setup a kubernetes-sig-testing-alerts googlegroup and pilot this as a standard for all sigs. The main question on my mind right now is whether we want to allow anyone to post, or attempt to restrict posting to keep it high-signal.

I think you have to allow anyone to post in order to let testgrid in actually. cc @michelle192837

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/milestone v1.14
/remove-lifecycle stale

@BenTheElder: You must be a member of the kubernetes/kubernetes-milestone-maintainers github team to set the milestone.

In response to this:

/milestone v1.14
/remove-lifecycle stale

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Not true @k8s-ci-robot! I defy your power!

/milestone v1.15
I am creating kubernetes-sig-testing-alerts@ and will be configuring testgrid to send alerts there for release-blocking jobs that we own.

/priority important-soon

Looking at https://groups.google.com/forum/#!forum/kubernetes-github-managment-alerts as a model, where I'd like to allow testgrid to post, but everything else to hit moderation

  • Created https://groups.google.com/forum/#!forum/kubernetes-sig-testing-alerts
  • Direct added kuberentes-sig-testing-leads@ members as Owners (with option: no email)
  • Settings -> Email options: [kubernetes-sig-testing-alerts] as prefix
  • Permissions -> Basic permissions:

    • anyone can post

  • Permissions -> Posting permissions

    • Attach Files: none selected

  • Permissions -> Access permissions

    • View Members: Owners of the group

    • View Email Addresses: Owners of the group

  • Settings -> Moderation:

    • Moderate all message to the group

    • Spam messages: send them to the moderation queue, but do not send notifications to moderators

  • Information -> General Information

    • Group description: "Open mailing list for alerts related to jobs or infrastructure for which kubernetes-sig-testing is responsible. All other posts will be moderated. Please do not attempt to reply to this list"

    • Posting options: uncheck "Allow users to post to the group on the web"

Sample alert: https://groups.google.com/forum/#!topic/kubernetes-sig-testing-alerts/ts-oo7uTBug

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevekuznetsov picture stevekuznetsov  路  3Comments

spzala picture spzala  路  4Comments

BenTheElder picture BenTheElder  路  4Comments

BenTheElder picture BenTheElder  路  3Comments

cjwagner picture cjwagner  路  3Comments