This is an overview of ideas I've been thinking of in the last 6 months as triage lead on the release team. Related initial discussion for point 1. can be found here: https://groups.google.com/forum/#!topic/kubernetes-sig-contribex/BvGmOQ0v5f0 , the rest should be further discussed in some meeting - 1.14 release retro is a good candidate
Series of items and features that would be beneficial if implemented:
|
open bug/PR
|
V
WAITING-ROOM: needs-sig, needs-sig-triage
| ^
(assign SIG) |
| |
V |
--> TRIAGE: needs-sig-triage<----
| / \ |
| (close with (verify) |
| reason) | |
| | V |
-- CLOSED BACKLOG: kind/*, priority/*
|
(assign or claim)
|
V
IN-PROGRESS: assignee
SIGs are tasked by definition to regularly search all issues and appropriately label them / categorize them. This is made much easier by implementing point 1.
Each SIG has a dedicated project/Kanban board each, where visibility of current and upcoming work and milestoned work is very, very visible with a quick glance - columns like Backlog, In Progress, Release-Blocking, etc. cc @parispittman @idvoretskyi on boards but for broader project usage
Case in point: https://github.com/orgs/kubernetes/projects/8 , the SIG-Windows board has worked great, both for them as a SIG and sig-release / release issue triage.
After SIG reviews the new ticket(issue), it gets an appropriate category - either via direct labels or via Project Board automated labels. thockin suggested the use of triage
labels which are a bit legacy and should be reworked in tandem with project boards to have the desired workflow.
An example on a project board being: issues moved from 'backlog' to 'in progress' automatically get a 'triage/inprogress' label (or smth similar). Label Automation + Projectboards + searchQueries should all have seamless integration and compliment each other in the final iteration of the new workflow.
Release team specific: Based on all above, incoming 'milestoned' work is work that belongs to SIGs and it should be a SIG's responsibility to control and estimate what can be done for each release cycle, with the release team stepping in only when needed (as release approaches). Standard calendar checkpoints in release-readiness will further help - this is what the 'Enhancements Deadline' stands for, but doesn't cover stuff outside of new features and that work is usually left for the release team to ponder upon their fate.
Therefore, a prototype flowchart is: New Ticket -> SIG -> Labeling or Deletion <-> Project Boards <-> Re-labeling based on current status <-> Release Team is able to view status at any time via project boards
For all above, mass rework of labels is needed.
'priority' labels are a subject of discussion in every release cycle as it's a fuzzy concept in itself, should be reworked with ideas such as 'impact' and 'importance' in mind,
'triage' labels are a bit old and currently mostly unused but can be very helpful if properly reworked and integrated into a standard system,
'kind' labels can be further reworked as there are many issues that do not belong in any current 'kind' (cc @BenTheElder)
deletion of unwanted labels or re-work into other ones,
addition of new labels like 'needs-sig-triage', 'release-blocking', 'wontfix' etc.
related initial issue for 'triage' labels: https://github.com/kubernetes/community/issues/3455
and with that all, rework of the old document located in https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md and possibly updating many others
Other generic improvements include:
Mechanism that auto-applies milestone in PRs that are merged out of code freeze, so the full list of PRs included in 1.14 is easily grepped
(issue is here https://github.com/kubernetes/test-infra/issues/11611)
Label that signifies an Issue/PR that is changing something Outside of core k/k, whether it's testing/releng/automation/dependencies/external bundles like fluentd-gcp et cetera. Currently there's only a kind/cleanup
which is rather vague. Label variety should be encouraged - with proper standardization, good ruling and automation around them they can be easily understood and utilized.
Related issue + doc on defining external dependencies in progress:
https://github.com/kubernetes/website/issues/12328
https://docs.google.com/document/d/1WA8N7C48nkJmme9a96DU0o9jBpeycPhht8WF-Eam9QQ/edit?usp=sharing
Labels that indicate whether a ticket is release-blocking or good-to-have, e.g. (kind/release-blocking | kind/good-to-have)
Label + mechanism that automatically shifts a Ticket to the next milestone
a few days after Freeze hits - this automates punting of 'good-to-have' stuff to the next milestone
Any ticket in the release-blocking column of a board automatically gets a kind/release-blocking label - this way, anyone can search github issues and PRs via label:kind/release-blocking+milestone:v1.14
query
Further improvements on how enhancements are handled here:
https://github.com/kubernetes/sig-release/issues/539
/sig release pm contributor-experience
tl;dr make ticket management easier for everyone
@kubernetes/sig-release @kubernetes/sig-contributor-experience-feature-requests
@thockin @guineveresaenger @nikhita @idvoretskyi @justaugustus @BenTheElder @neolit123
@kubernetes/sig-testing @fejta @cjwagner @BenTheElder
Summarized the points for discussion in the 1.14 retro doc
https://docs.google.com/document/d/1he2axf3adOIk3gA3vxFAewejtE2tm3Wl1NA1p-ooXpo/edit#
/assign
/assign
/milestone May
This is an umbrella issue, so moving out of the current milestone.
/milestone Next
@fejta @spiffxp there seems to be enough work here on the Prow side to have this fulfill an Epic for us.
Here's the state machine as a chart, based on what I have in my head:
| State | Description | Entry Criteria | Bot Actions | Human Actions |Exit Criteria |
|---|---|---|---|---|---|
| Open | Default state when an issue is opened | N/A | needs/sig
and needs/triage
are applied | One or more sig/*
labels are applied | Has sig/*
label |
| Triage | SIG triages issue to determine if it needs more info, should be closed, or moved to the backlog | Has sig/*
and needs/triage
label | N/A | Needs info: send /needs info
, Closed: send /close <reason>
, Backlog: apply kind/*
and priority/*
| Has closed/*
OR kind/*
and priority/*
|
| Closed/Complete | SIG has determined that issue was completed or cannot be completed | Has closed/*
label | needs/triage
, needs/info
are removed | Can send /reopen
to reopen the issue | N/A, complete state |
| Backlog | SIG has determined that issue is relevant and should be picked up by a SIG member | Has kind/*
and priority/*
label | needs/triage
, needs/info
are removed | Assign the issue - self: /lifecycle active
, /assign
(applies lifecycle/active
), org member: /assign <org-member>
| Has lifecycle/active
label |
| In Progress | SIG member has begun work on the issue | Has lifecycle/active
label | N/A | Work the issue, send /close [<reason>]
| Has closed/*
OR stale labels |
| Stale | Issue has been open for some interval without an update | Issue has been open 30 days without an update | lifecycle/{needs-attention,stale,rotten
is applied, lifecycle/active
is removed | Active: send /lifecycle active
, Close: send /close [<reason>]
, Freeze: send /lifecycle frozen
| Has lifecycle/active
, closed/*
, or lifecycle/frozen
|
| Frozen | Issue is a long-term priority for the SIG and should not be subject to stale labels | Has lifecycle/frozen
label | lifecycle/{needs-attention,stale,rotten
is removed | Close: send /close [<reason>]
, Unfreeze: send /remove-lifecycle frozen
| Has closed/*
label OR lifecycle/frozen
is removed |
needs/sig
needs/triage
needs/more-info
closed/complete
closed/support
closed/duplicate|dupe
closed/not-reproducible|no-repro
closed/unresolved
lifecycle/active
lifecycle/needs-attention
lifecycle/stale
lifecycle/rotten
lifecycle/frozen
priority/critical-urgent
priority/important-soon
priority/important-longterm
needs-*
labels to needs/
and allow for /needs
commandstriage/needs-information
to needs/[more-]info
and the remaining triage/*
to closed/*
priority/*
labels/sig testing
Thanks so much for putting this together, @nikopen!
Allow me to comment on a few of these items...
- All issues hitting K/K are auto-labeled as 'needs-sig-triage' or something similar.
addressed here: kubernetes/test-infra#11818
Agreed. This is a great first step with the immediate impact of being able to search by a single label, instead of an aggregate of them.
I'm in favor of needs-triage
or needs/triage
.
- SIGs are tasked by definition to regularly search all issues and appropriately label them / categorize them. This is made much easier by implementing point 1.
Are SIGs indeed tasked with this by definition or is it an undocumented expectation?
Each SIG has a dedicated project/Kanban board each, where visibility of current and upcoming work and milestoned work is very, very visible with a quick glance - columns like Backlog, In Progress, Release-Blocking, etc. cc @parispittman @idvoretskyi on boards but for broader project usage
Case in point: https://github.com/orgs/kubernetes/projects/8 , the SIG-Windows board has worked great, both for them as a SIG and sig-release / release issue triage.
Agreed that this would be benefitial on the SIG level, but for the Release Team, they'd still have to run through multiple boards to get an idea of what's happening. Perhaps a dashboard would be more useful?
- After SIG reviews the new ticket(issue), it gets an appropriate category - either via direct labels or via Project Board automated labels. thockin suggested the use of
triage
labels which are a bit legacy and should be reworked in tandem with project boards to have the desired workflow.
An example on a project board being: issues moved from 'backlog' to 'in progress' automatically get a 'triage/inprogress' label (or smth similar). Label Automation + Projectboards + searchQueries should all have seamless integration and compliment each other in the final iteration of the new workflow.
A few things here...
closed/*
. Issues assigned and in progress could instead searched via lifecycle/active
. Any other states seem to be covered by the state chart above.
- Release team specific: Based on all above, incoming 'milestoned' work is work that belongs to SIGs and it should be a SIG's responsibility to control and estimate what can be done for each release cycle, with the release team stepping in only when needed (as release approaches). Standard calendar checkpoints in release-readiness will further help - this is what the 'Enhancements Deadline' stands for, but doesn't cover stuff outside of new features and that work is usually left for the release team to ponder upon their fate.
What do you think we can do to improve this, without too much friction?
- Therefore, a prototype flowchart is: New Ticket -> SIG -> Labeling or Deletion <-> Project Boards <-> Re-labeling based on current status <-> Release Team is able to view status at any time via project boards
What are we trying to glean here? Completeness of the task? Last updated time?
Again, I think a dashboard would ultimately be more useful to the Release Team here.
- For all above, mass rework of labels is needed.
'priority' labels are a subject of discussion in every release cycle as it's a fuzzy concept in itself, should be reworked with ideas such as 'impact' and 'importance' in mind,
'triage' labels are a bit old and currently mostly unused but can be very helpful if properly reworked and integrated into a standard system,
'kind' labels can be further reworked as there are many issues that do not belong in any current 'kind' (cc @BenTheElder)
deletion of unwanted labels or re-work into other ones,
addition of new labels like 'needs-sig-triage', 'release-blocking', 'wontfix' etc.
related initial issue for 'triage' labels: #3455
Agreed on some of the rework (see above), but I think we should punt on doing anything with the kind/*
, priority/*
labels in the near term. I only say that because these labels lead to some bikeshedding and I don't think refactoring them is strictly necessary to move this forward.
- and with that all, rework of the old document located in https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md and possibly updating many others
+1.
Other generic improvements include:
- Mechanism that auto-applies milestone in PRs that are merged out of code freeze, so the full list of PRs included in 1.14 is easily grepped
(issue is here kubernetes/test-infra#11611)
+1.
- Label that signifies an Issue/PR that is changing something Outside of core k/k, whether it's testing/releng/automation/dependencies/external bundles like fluentd-gcp et cetera. Currently there's only a
kind/cleanup
which is rather vague. Label variety should be encouraged - with proper standardization, good ruling and automation around them they can be easily understood and utilized.
Related issue + doc on defining external dependencies in progress:
kubernetes/website#12328
https://docs.google.com/document/d/1WA8N7C48nkJmme9a96DU0o9jBpeycPhht8WF-Eam9QQ/edit?usp=sharing
Let's land the standard and then reassess adding other label types.
- Labels that indicate whether a ticket is release-blocking or good-to-have, e.g. (kind/release-blocking | kind/good-to-have)
release-blocking
would probably be a priority; good-to-have
I'm not sure about. Same opinion around punting this until we land the workflow.
- Label + mechanism that automatically shifts a Ticket to the next milestone
a few days after Freeze hits - this automates punting of 'good-to-have' stuff to the next milestone
+1.
- Any ticket in the release-blocking column of a board automatically gets a kind/release-blocking label - this way, anyone can search github issues and PRs via
label:kind/release-blocking+milestone:v1.14
query
I need to ponder how the board interaction would work, as this functionality doesn't exist natively.
- Further improvements on how enhancements are handled here:
kubernetes/sig-release#539
I still owe a response for the enhancements tracking stuff. I'll add notes to that issue.
@nikopen -- Also, this is meaty and impactful enough now that it's deserving of a KEP.
Let's kick around thoughts on the state machine before moving forward with that.
Excellent, thanks for the lengthy responses!
more thoughts ----
_state machine / labels_
lifecycle _ready_
or similar - to indicate a ticket that is triaged and ready to be picked up - lifecycle active
could mean more like in progress
. as in, needs triage -> lifecycle ready -> lifecycle active,
or
needs triage -> lifecycle/backlog -> lifecycle/ready -> lifecycle/active
depending on how many columns of actions are decided. big list of items to-do, smaller list of prioritized, smaller list of in-progress?
as lifecycle will likely be used more often, stale and rotten timeouts could be increased to 50 days stale -> 120 days rotten or similar
agree with actions
on the rest:
1 _open needs-triage PR_
2 _written definitions of workflow_
it should be in a final doc as guidelines
or best practices
, replacing the older docs like this one , best to make sure it accomodates the needs of most teams from the getgo through lazy consensus etc
3 _release team / many boards_
what would that dashboard be made of?
4 _project boards_
if ticket has sig/XYZ + needs/triage, then put into "triage" column in "sig/XYZ" board
) - then each SIG/team can accomodate to their needs. I'm looking into that with Github5 _automatic milestoned work_
7 _label rework_
agree it's hard, proposed ones as above are good to start with
+1 to rest
let's move forward with 1. and create a KEP^^
relevant - issue bot https://github.com/kubernetes/community/issues/3672
A few thoughts:
close/*
labels? These get stale if something is reopened. Is this something we need to or want to track via labels?Will think and look at this more after code freeze.
@cblecker ^^
on the docs we can work with sig-docs folks like @zacharysarah @Bradamant3 to have them as simple as possible
To add on @cblecker point - initially when I proposed the triage labels they were actually close/* along with auto closing of issues. We had some folks liked the name but eventually with further discussion with community we voted to go with triage/* instead of close/* - https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kubernetes-dev/c8J8VOYeDB8/39kYEYkrBwAJ and https://groups.google.com/forum/#!searchin/kubernetes-sig-contribex/sahdev%7Csort:date/kubernetes-sig-contribex/IoENWO_2p2g/9_8TEHRVAQAJ (second link is based on search because kubernetes-wg-contribex is renamed and doesn't exit now) Thanks!
Good to know!
Ultimately in their current state they just confuse people and they're old/mostly unused, so it's better to switch them back to close/ with some extra automation or even remove them.
It's necessary to change/remove them in order to move forward.
I don't understand why it's necessary to change/remove them in order to move forward but I am totally fine with the overall decision here. It may be helpful if we keep all or some of them as such or with some renaming if it's confusing, and try educate triage engineers about the usage. As @cblecker mentioned close/* may be confusing as well. The idea of using triage/* labels was to use them mostly in conjunction with closing issues. More like,
An issue was closed (or should close) because of a particular triage outcome (support, not-reproducible, duplicate etc) which could potentially speed up closing, give a better understanding of the reason behind a close and an a better query to see how many support or not-reproducible etc issues we runs into.
Such a label can provide SIGs a high level understanding of an issue per basic findings from a new or experienced contributor (e.g. an issue was already attempted to reproduce unsuccessfully or found a probable duplicate with a provided link of duplicate as a comment).
Potentially can be used for auto closing with ease (e.g. an issue that is identified as not-reproducible or needs-information could be closed quicker than waiting for normal auto close when issue reporter don't provide requested information in a certain numbers of days)
Also when I search issues with label:triage/support
or label:triage/unresolved
they seems used widely. Thanks!
@spzala There's a lot of scattered context throughout this and other tickets that you might have missed.
the first actionable item of this ticket is a continuation of the discussion in this link which is about live issue triage by @thockin . 99% of the triage
label usage you can see on your search queries are for sig-network
, mostly by @thockin, and used as a provision because changing them needs consensus/communication/ has a big impact, et cetera. It would be great if we can resolve this.
I think people would be very happy to reclaim triage/ labels by changing name, have a way to categorize new incoming issues via needs-triage
, and utilize the existing lifecycle
labels (or other) for issue status.
Happy to hear other suggestions if closed/ is confusing, though I think it's pretty straightforward - focused on giving context to closed issues.
We could also move this specific discussion to this issue which is focused on this topic, so we can ideally resolve it soon:
https://github.com/kubernetes/community/issues/3455
I'll ping @thockin to weigh in on this.
Thanks @nikopen and sounds good. I have no objection with changes if that's making things simpler, some of those triage/* labels are probably pre-triage sort of findings with intention to help speed up triage by SMEs. closed/* is good, but it may not help identifying issues that are candidates for close. Agree that new issues always needs-triage.
With this triage label (similarly to having 1 global triage label now), how do we differentiate between different SIGs that need to look at an issue? Say, something is tagged apps, architecture, and apimachinery... if the first person (say in apimachinery) comes along and marks it as triaged, that's unhelpful for the other SIGs.
@vllry -- There would be an additional label (lifecycle/ready
), which when applied, would remove needs-triage
.
While needs-triage
is applied, SIGs can search on that and assign to members of other SIGs as required.
Does that sound okay?
The alternative I see is having per-SIG triage labels, which I think would get messy quickly.
Indeed, if someone deems that an issue is triaged and is ready to be worked on, then applying the ready label would be enough to remove needs-triage
and signify any number of SIGs attached to the issue that it can move forward.
An issue might be cross-SIG and that's up to the participating SIGs to determine how they will work together or not, there's little Labels can do to help in this case - it's context for actual comments / comms.
Generally it's an edge case that can be handled in other ways.
I can create a dedicated DevStats dashboard around this if needed.
/priority important-soon
/cc
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
@nikopen -- Are you coming back to this one?
/remove-lifecycle stale
/lifecycle frozen
I've opened the following PRs, to carry the work, as it has stalled out:
needs-triage
: https://github.com/kubernetes/test-infra/pull/16298Both PRs are on _explicit hold_ until we/I produce a phase 0 KEP for issue triage.
(I hope to get that out to you all this cycle.)
Enhancement issue opened: https://github.com/kubernetes/enhancements/issues/1553
Provisional Issue Triage KEP opened: https://github.com/kubernetes/enhancements/pull/1554
/unassign @nikopen
/remove-sig pm
/area enhancements
Mislabeled:
/remove-area enhancements
/remove-lifecycle frozen
When can we hope to see this in action?
On Wed, May 20, 2020 at 10:27 AM Marky Jackson notifications@github.com
wrote:
/remove-lifecycle frozen
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/community/issues/3456#issuecomment-631616318,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABKWAVHHHOXLP3ZBBIQ44PDRSQHJPANCNFSM4G7KPUPA
.
@thockin -- I had some Releng work to do with anago, but will be picking this up later in the week and next week.
The PR is already mostly complete here: https://github.com/kubernetes/test-infra/pull/16298
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.