Test-infra: Kubernetes CI Policy: all prow.k8s.io jobs must have testgrid alerts configured

Created on 1 Aug 2020  路  14Comments  路  Source: kubernetes/test-infra

Part of https://github.com/kubernetes/test-infra/issues/18551

Why this is important:

  • In order to ensure effective use of community resources, we expect them to be spent on jobs that provide useful signal, and are actively maintained
  • Configuring testgrid alerts requires setting an e-mail address, which gives us a point of contact to escalate to if a job is deemed an ineffective use of community resources
  • We'll use this to implement a policy where we reserve the right to remove/disable jobs that are deemed an ineffective use of resources (e.g. perma-failing for O(weeks)) if the point of contact is unresponsive

TODO:

  • come up with test to enforce this policy (logging only)
  • come up with report to identify jobs and likely candidate sig owners

    • eg: if a job is on "sig-foo"'s dashboard, it's likely they want to own it

    • who to contact for jobs that aren't sigs?

  • send out notice to all sigs, give a deadline of N weeks
  • any jobs that don't have e-mail addresses configured should be removed
  • flip test to failing once all jobs meet the policy

Other thoughts / notes:

  • need to parse both prowjob config and testgrid config for this

    • there are some testgrids not populated by prow, we can probably ignore these?

    • there are some prowjobs that don't have all their testgrid config in annotations

/sig testing

arejobs aretestgrid sitesting

Most helpful comment

Some prior art to serve as starting points:

  • go run ./experiment/prowjob-report/main.go --config ./config/prow/config.yaml --job-config ./config/jobs --format csv > jobs.csv which I periodically reimport into this spreadsheet

    • the owner_dash column is a guess on who should own the job based on which sig/wg-prefixed testgrid dashboard the job lives under

    • could run with --format json and pipe through to jq

    • there are some gaps; the big one I'm aware of is that it misses testgrid info that is solely in testgrid config

  • This test (which does more than its name lets on) should serve as a good starting point for how to fail or log jobs/tabs that don't have alerts configured: https://github.com/kubernetes/test-infra/blob/2293860ee4926f2c474188805007f71462966edc/config/tests/testgrids/config_test.go#L506-L544

All 14 comments

Some prior art to serve as starting points:

  • go run ./experiment/prowjob-report/main.go --config ./config/prow/config.yaml --job-config ./config/jobs --format csv > jobs.csv which I periodically reimport into this spreadsheet

    • the owner_dash column is a guess on who should own the job based on which sig/wg-prefixed testgrid dashboard the job lives under

    • could run with --format json and pipe through to jq

    • there are some gaps; the big one I'm aware of is that it misses testgrid info that is solely in testgrid config

  • This test (which does more than its name lets on) should serve as a good starting point for how to fail or log jobs/tabs that don't have alerts configured: https://github.com/kubernetes/test-infra/blob/2293860ee4926f2c474188805007f71462966edc/config/tests/testgrids/config_test.go#L506-L544

/help

@spiffxp, myself, @ScrapCodes and @rayandas will do work together on the test. Coordinating with them both now.

So @rayandas @scrapcodes and myself met up to do exploratory work on this and as result we learned a little bazel!

To run the test above we need to invoke bazel as follows :

cd TEST_INFRA_REPO_ROOT
bazel test //config/tests/testgrids:go_default_test --config--test_output=all  

Worth noting that we must use bazel to run this test and not the go test runner.

The bazel build file is used to pull in test-grid runtime configuration using the
following dependencies :

        "@com_github_googlecloudplatform_testgrid//config:go_default_library",
        "@com_github_googlecloudplatform_testgrid//pb/config:go_default_library",

and also to pass in parameters to the test

        "--config=$(location testconf.pb)",
        "--prow-config=$(location //config/prow:config.yaml)",
        "--job-config=config/jobs",

/assign
/remove help-wanted

To run the specific test use

bazel test //config/tests/testgrids:go_default_test \
--test_output=all \
--test_filter=TestReleaseBlockingJobsMustHaveTestgridDescriptions

/remove help-wanted

@RobertKielty the command is /remove help (not intuitive IMO)... but speaking of, are you still working on this?

Reviewing this now.

I want to talk about to @spiffxp about this when he gets back.

Spoke with @spiffxp about this issue where I proposed writing a helper function to decouple selection of Kubernetes jobs from testing their policy conformance.

/remove help

/remove-help

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cjwagner picture cjwagner  路  3Comments

cjwagner picture cjwagner  路  3Comments

lavalamp picture lavalamp  路  3Comments

stevekuznetsov picture stevekuznetsov  路  3Comments

sjenning picture sjenning  路  4Comments