Sig-release: GCB job labels for easier debug

Created on 2 Mar 2020  路  12Comments  路  Source: kubernetes/sig-release

What would you like to be added:

A set of label "tags" for GCB jobs. Eg:

  • major version in {1.18, 1.19, 1.20, ...}
  • full version in {1.18.0, 1.18.1, 1.18.2, ..., 1.19.0, 1.19.1, ...}
  • subversion in {alpha.0, alpha.1, ..., beta.0, beta.1, ..., rc.0, rc.1, ...}
  • type in {alpha, beta, rc, official}

Perhaps most all of the input state for a job should be conforming to a labeling taxonomy and tagged on the job. Today this state is output in a non-searchable form (text echos to the log), yet they are structured key:value data at runtime.

Why is this needed:

Our builds happen in Google Cloud Build. That tool uses a uuid to represent builds. You can fetch a build log by uuid. A typical workflow related to anago and gcbmgr today is to issue a few monolithic commands, eg:

./gcbmgr release --official --nomock release-1.13 --buildversion=v1.13.3-beta.0.37+721bfa751924da

which then once running will output the uuid of the job for reference, which is probably immediately used in a subsequent command invocation, eg:

./gcbmgr tail bcd8809f-afd0-40fd-8498-561a596e7bbd

in order to track what happens. You can also gcbmgr list to get recent jobs' uuids. But you don't immediately see from that which uuids are for which build versions, rather probably need to download a few dozen logs and grep them for the version strings you want and build your own table of relevant uuids to relevant logs to compare runs. This is suboptimal.

But this points further at the manual aspects of our process. We don't have a pipeline of small functionality with clear inputs/outputs and events and triggers moving work through the pipeline so a human tail's the log and watches for success or upon failure scrolls back to understand what happened, usually followed by asking "how did this ever work before?!"

As we shift to a pipeline, retro-active debugging will become more important yet individuals will likely become less familiar with the typical inputs/outputs since they're no longer twiddling them manually on the command line and the subparts of a build will span multiple log files. Humans will need an easy way to get all the logs of a current build flow for something like "1.20.0-rc.0" and likely they want to compare those with the logs from "1.19.0-rc.*". A taxonomy of build tags would help for this searching.

User stories are going to be scenarios like "As a branch manager I want to see:

  • ...beta build logs" (what's different between now and prior quarters)
  • ...official build logs" (what's different between now and recent stable)
  • ...1.18 build logs" (what's changed in this quarter's dev)
  • ...1.15.6 logs" (what's wrong with my current build attempts)
  • ...1.17.0 beta logs" (what's changed across a set of dev builds)
arerelease-eng kinfeature prioritimportant-longterm sirelease

Most helpful comment

Aha so most of what we'd likely want are there. Docs on querying sound really important then!

All 12 comments

/area release-eng
/priority important-longterm
/help

@kubernetes/release-managers
@kubernetes/release-engineering
FYI

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten
/assign

/remove-help

the current tags we have are: (can check )

Tags:

- ${_GCP_USER_TAG}
- ${_RELEASE_BRANCH}
- ${_BUILD_POINT}
- ${_NOMOCK_TAG}
- STAGE
- ${_GIT_TAG}
- ${_TYPE_TAG}

for example: 
email from who trigger: ctadeu-at-gmail-com
branch: release-1.15
build-version: v1.15.12-beta.0.33-5f400ccfa32aff
type: official
stage: STAGE
git tag: v0.2.7-55-g191ddd0-20200416 (for the release repo)

phase 1:

  • add some new tags like the one proposed in the Issue

phase 2:

  • improve the list jobs and maybe create a subcommand to have better control, because today we can just list the last X jobs, no filtering.

what do you think @tpepper @saschagrunert ?

what do you think @tpepper @saschagrunert ?

Sounds good to me, maybe some docs around how to filter for such tags in the Google Cloud Console would be a valuable addition, too?

Addendum for phase 1 :)

  • add docs for filter for such tags in the Google Cloud Console

thanks @saschagrunert

Aha so most of what we'd likely want are there. Docs on querying sound really important then!

this is done
/close

@cpanato: Closing this issue.

In response to this:

this is done
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

justaugustus picture justaugustus  路  7Comments

justaugustus picture justaugustus  路  6Comments

markyjackson-taulia picture markyjackson-taulia  路  4Comments

daminisatya picture daminisatya  路  8Comments

bg-chun picture bg-chun  路  6Comments