Enhancements: PodDisruptionBudget and /eviction subresource

Created on 30 Aug 2016  ·  77Comments  ·  Source: kubernetes/enhancements

Description

Various cluster management operations may "voluntarily" evict pods. By "voluntary" we mean the operation can be safely delayed for a reasonable period of time. The principal examples today are draining a node for maintenance or upgrade (kubectl drain), and cluster autoscaling down. In the future we will have the rescheduler and possibly other examples. (In contrast, something like evicting pods because a node has become unreachable or reports NotReady, is not "voluntary.") For these "voluntary" evictions it can be useful for applications to be able to limit the number of pods that are down, or the rate with which pods are evicted. For example, a quorum-based application would like to ensure that the number of replicas running is never brought below the number needed for a quorum, even temporarily. Or a web front end might want to ensure that the number of replicas serving load never falls below a certain percentage of the total, even briefly. PodDisruptionBudget is an API object that specifies the minimum number or percentage of replicas of a collection that must be up at a time, and/or the maximum eviction rate across the collection. Components that wish to eviction a pod subject to disruption budget use the /eviction subresource on pod; unlike a regular pod deletion, this operation may be rejected by the API server if the eviction would cause a disruption budget to be violated.

kubernetes/kubernetes#12611

Progress Tracker

  • [ ] Before Alpha

    • [x] Design Approval

    • [x] Design Proposal: here, discussion was in kubernetes/kubernetes#22217

    • [x] Initial API review (if API). Maybe same PR as design doc. kubernetes/kubernetes#12611 kubernetes/kubernetes#24697



      • Any code that changes an API (/pkg/apis/...)


      • cc @kubernetes/api



    • [x] Identify shepherd (your SIG lead and/or [email protected] will be able to help you). My Shepherd is: _@davidopp_



      • A shepherd is an individual who will help acquaint you with the process of getting your feature into the repo, identify reviewers and provide feedback on the feature. They are _not_ (necessarily) the code reviewer of the feature, or tech lead for the area.


      • The shepherd is _not_ responsible for showing up to Kubernetes-PM meetings and/or communicating if the feature is on-track to make the release goals. That is still your responsibility.



    • [ ] Identify secondary/backup contact point. My Secondary Contact Point is: _replace.[email protected]_ (and/or GH Handle)

    • [x] Write (code + tests + docs) then get them merged. kubernetes/kubernetes#24697 kubernetes/kubernetes#25551 kubernetes/kubernetes#25288 kubernetes/kubernetes#25297 kubernetes/kubernetes#25921 kubernetes/kubernetes#30800 kubernetes/kubernetes#31033

    • [x] Code needs to be disabled by default. Verified by code OWNERS

    • [x] Minimal testing

    • [ ] Minimal docs



      • cc @kubernetes/docs on docs PR


      • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off


      • New apis: _Glossary Section Item_ in the docs repo: kubernetes/kubernetes.github.io



    • [ ] Update release notes

  • [ ] Before Beta

    • [ ] Testing is sufficient for beta

    • [ ] User docs with tutorials

    • _Updated walkthrough / tutorial_ in the docs repo: kubernetes/kubernetes.github.io

    • cc @kubernetes/docs on docs PR

    • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

    • [ ] Thorough API review

    • cc @kubernetes/api

  • [ ] Before Stable

    • [ ] docs/proposals/foo.md moved to docs/design/foo.md

    • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

    • [ ] Soak, load testing

    • [ ] detailed user docs and examples

    • cc @kubernetes/docs

    • cc @kubernetes/feature-reviewers on this issue to get approval before checking this off

_FEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers._
FEATURE_STATUS: IN_DEVELOPMENT

More advice:

Design

  • Once you get LGTM from a _@kubernetes/feature-reviewers_ member, you can check this checkbox, and the reviewer will apply the "design-complete" label.

Coding

  • Use as many PRs as you need. Write tests in the same or different PRs, as is convenient for you.
  • As each PR is merged, add a comment to this issue referencing the PRs. Code goes in the http://github.com/kubernetes/kubernetes repository,
    and sometimes http://github.com/kubernetes/contrib, or other repos.
  • When you are done with the code, apply the "code-complete" label.
  • When the feature has user docs, please add a comment mentioning @kubernetes/feature-reviewers and they will
    check that the code matches the proposed feature and design, and that everything is done, and that there is adequate
    testing. They won't do detailed code review: that already happened when your PRs were reviewed.
    When that is done, you can check this box and the reviewer will apply the "code-complete" label.

Docs

  • [ ] Write user docs and get them merged in.
  • User docs go into http://github.com/kubernetes/kubernetes.github.io.
  • When the feature has user docs, please add a comment mentioning @kubernetes/docs.
  • When you get LGTM, you can check this checkbox, and the reviewer will apply the "docs-complete" label.
kinapi-change kinfeature siapps stagstable trackeno

Most helpful comment

@daminisatya I have created a placeholder PR https://github.com/kubernetes/website/pull/15859

All 77 comments

cc/ @mml

@davidopp Are docs required for this feature?

Yes, @mml is starting to work on it.

@mml Another ping on docs. Any PRs you can point me to?

Planning to move this to Beta in 1.5, see kubernetes/kubernetes#25321

Please consider https://github.com/kubernetes/kubernetes/issues/34776 before beta, as addressing it might be a breaking API change (makes a required field optional).

Is it really not allowed to add a mutually exclusive field once something is already in Beta? I don't understand why that would be the case (I may be missing something though). I thought the "backward compatibility" requirement for Beta just meant that an object from the old API version has to be usable in the new binary version. It seems that would be the case, since the new binary version understands both fields.

@davidopp can you confirm that this item targets beta in 1.6?

It was beta in 1.5. beta-in-1.5 label looks correct.

We don't have any short-term plan for additional work on this, so I'll close.

@davidopp when is this going to GA? should we keep this open until then?

cc @maisem

Yeah, let's reopen until it goes to GA. Not sure when that's happening, so I've removed the milestone. The sig-apps folks have had various ideas for changes we should make before GA but we haven't had bandwidth to discuss it. I assume the discussion will happen on kubernetes/kubernetes#25321

@davidopp thank you for updating. I'm adding "next-milestone" tag to have the easier way to filter and track the ongoing, but latter-scheduled features.

@davidopp any progress is expected on this feature? If yes, please update the feature description with the new template.

Responsibility for feature has been moved from sig-scheduling to sig-apps. I've updated the label and changed the assignee to @erictune.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

/assign

/reopen
@kubernetes/sig-apps-feature-requests
When (if at all) is this going stable?

/unassign

@davidopp @mattfarina @prydonius @kow3ns @kubernetes/sig-apps-feature-requests --
Do we have an owner for Pod Disruption Budgets?

/unassign @erictune
/assign @mattfarina @prydonius @kow3ns

sig-apps owns it (see https://github.com/kubernetes/features/issues/85#issuecomment-299398708)

Yep, saw that note, but since then it hasn't been updated in over a year and fell prey to fejta-bot.
Is there a new _human_ owner?

Hi
This enhancement has been tracked before, so we'd like to check in and see if there are any plans for this to graduate stages in Kubernetes 1.13. This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:

  • Docs (open placeholder PRs): 11/8
  • Code Slush: 11/9
  • Code Freeze Begins: 11/15
  • Docs Complete and Reviewed: 11/27

Please take a moment to update the milestones on your original post for future tracking and ping @kacole2 if it needs to be included in the 1.13 Enhancements Tracking Sheet

Thanks!

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale – we're sorely affected by the issue described in https://github.com/kubernetes/kubernetes/issues/45398, connecting that here so that it's tracked with this feature's stable status...

/remove-lifecycle stale
per the above (everything on a line is interpreted as part of the command, so if you're going to comment it needs to be on a newline @rdsubhas)
/stage beta
/lifecycle frozen
I'm freezing this to make clear that it's currently stuck in beta

Enhancement issues opened in kubernetes/enhancements should never be marked as frozen.
Enhancement Owners can ensure that enhancements stay fresh by consistently updating their states across release cycles.

/remove-lifecycle frozen

Hello @rdsubhas @davidopp , I'm the Enhancement Lead for 1.15. Is this feature going to be graduating alpha/beta/stable stages in 1.15? Please let me know so it can be tracked properly and added to the spreadsheet. This will also require a KEP for inclusion into 1.15. Please work on that first.

Once coding begins, please list all relevant k/k PRs in this issue so they can be tracked properly.

(hi @kacole2 I'm not responsible for PDB feature, sorry)

Thread regarding v1 criteria:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kubernetes-sig-apps/PL5EY_XFj9Y/rtUFhwd_GwAJ

cc @bsalamat

IMHO kubernetes/kubernetes#66811 would be a great addition before going stable.

Hi @davidopp , I'm the 1.16 Enhancement Lead. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.

As a reminder, every enhancement requires a KEP in an implementable state with Graduation Criteria explaining each alpha/beta/stable stages requirements.

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

@davidopp @mortent @bsalamat @liggitt is this going to be graduating to stable based on #904 being merged?

/milestone v1.16
/stage stable

Note that the design doc is here:
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/rescheduling.md

The lack of a reasonable way to handle maxUnavailable=0 is a problem. My thoughts on that are here:
https://github.com/kubernetes/kubernetes/issues/66811#issuecomment-517520271

@davidopp @mortent @bsalamat @liggitt

I'm one of the v1.16 docs shadows.
Does this enhancement (or the work planned for v1.16) require any new docs (or modifications to existing docs)? If not, can you please update the 1.16 Enhancement Tracker Sheet (or let me know and I’ll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.16) due by Friday, August 23rd, it can just be a placeholder PR at this time. Let me know if you have any questions!

@daminisatya I have created a placeholder PR https://github.com/kubernetes/website/pull/15859

@mortent code freeze for 1.16 is on Thursday 8/29. Are there any outstanding k/k PRs that still need to be merged for this to go Stable?

There was a concern from @bgrant0607 above that needs to be addresses. All of the k/k PRs in the original post have been merged or closed. Please advise. Thanks!

@kacole2 I'm still working on this. The WIP PR is https://github.com/kubernetes/kubernetes/pull/81571. I am still hoping to get this completed and merged before the code freeze, but I'm working through some issues with the PR.

We discussed https://github.com/kubernetes/kubernetes/issues/66811 in sig-apps two weeks ago. There was agreement that since we don't have a good solution to the situation described, we should not let it hold up GA for PDBs.

@kacole2 I think PR https://github.com/kubernetes/kubernetes/pull/81571 should be ready now. Hopefully we can get it merged before the freeze.

Hi @mortent it looks as though https://github.com/kubernetes/kubernetes/pull/81571 didn't merge before code freeze and it's not in the Tide Merge Pool. This feature is going to be bumped from v1.16. If you would still like to have this be a part of the 1.16 release, please file an exception

@kacole2 I don't see any problems with bumping this from 1.16. I'll follow up and try to get this into 1.17.

/milestone clear
/milestone v1.17

edited to add: we'll have @kacole2 re-set the milestone then

@guineveresaenger: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone clear
/milestone v1.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Hey there @mortent -- 1.17 Enhancements lead here. I wanted to check in and see if you think this Enhancement will be graduating to stable in 1.17?

The current release schedule is:

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

If you do, once coding begins please list any additional relevant k/k PRs in this issue so they can be tracked properly. 👍

Thanks!

The plan is to get this into 1.17. Code should hopefully be ready to merge soon https://github.com/kubernetes/kubernetes/pull/81571

Awesome. Thank you for the quick response @mortent! I'll go ahead and add it to the tracking sheet.

Hello @mortent @davidopp I'm one of the v1.17 docs shadows.
Does this enhancement for (or the work planned for v1.17) require any new docs (or modifications to existing docs)? If not, can you please update the 1.17 Enhancement Tracker Sheet (or let me know and I'll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.17) due by Friday, November 8th, it can just be a placeholder PR at this time. Let me know if you have any questions!

@mortent @davidopp

Since we're approaching Docs placeholder PR deadline on Nov 8th. Please try to get one in against k/website dev-1.17 branch.

Hey @mortent 1.17 Enhancement Shadow here! 👋 I am reaching out to check in with you to see how this enhancement is going.

The Enhancement team is currently tracking PR kubernetes/kubernetes#81571 in the tracking sheet. Are there any other k/k PRs that need to be tracked as well?

Also, another friendly reminder that we're quickly approaching code freeze (Nov. 14th).

I hope the changes in https://github.com/kubernetes/kubernetes/pull/81571 will be ready to merge soon. In addition to that PR, there is a also a separate PR to promote some of the e2e tests for PDBs to conformance tests: https://github.com/kubernetes/kubernetes/pull/84740

Thank you @mortent! I'll update the tracking sheet with the PRs above.
Also, as one of the docs shadow has mentioned, you must have docs placeholder PR to have this enhancement included in the release. Please update us on a doc PR by Nov. 8th. Thanks again!

Hi @mortent , Tomorrow is code freeze for 1.17 release cycle. It looks like the k/k PRs have not been merged. We’re flagging this enhancement as At Risk in the 1.17 tracking sheet.

Do you think all necessary PRs will be merged by the EoD of the 14th (Thursday)? After that, only release-blocking issues and PRs will be allowed in the milestone with an exception.

Hi @mortent, Code freeze is now in effect for the 1.17 release. Unfortunately, k/k PRs for this enhancement was not merged in. The Enhancement team has removed this enhancement from the milestone. If you feel this is release blocking, please file an exception request.

/milestone clear

Hey there @mortent,

Jeremy from the 1.18 enhancements team here 👋

It looks like your PRs haven't been merged yet. Are you planning on landing this in 1.18? If so, we'll track this enhancement. Code Freeze will be March 5th.

@jeremyrickard Still working on this, so we should track it for 1.18.

/milestone 1.18

@jeremyrickard: The provided milestone is not valid for this repository. Milestones in this repository: [keps-beta, keps-ga, v1.17, v1.18, v1.19, v1.20, v1.21]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 1.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/milestone v1.18

Hey @mortent

Seth here, Docs shadow on the 1.18 release team.

Does this enhancement work planned for 1.18 require any new docs or modifications to existing docs?

If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)

If doc updates are required, reminder that the placeholder PRs against k/website (branch dev-1.18) are due by Friday, Feb 28th.

Let me know if you have any questions!

Hey @mortent

Code freeze is just over a week away, could you please list out any k/k PRs you are working on in addition to the two above for this issue so that we can better track the issue?

@jeremyrickard This will not make it in the 1.18 release. There are still open issues that we need to address.

@mortent Thank you for the update. :)

/milestone clear

Hi @mortent -- 1.19 Enhancements Lead here, do you plan to graduate this Enhancement to stable this release?

The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released

@palnabarun There are still some open issues that needs to be resolved for PDBs. I hope we can resolve most of them and then target the 1.20 release for graduating PDBs to GA.

Thanks @mortent for the updates. Deferring the enhancement then. :slightly_smiling_face:

/milestone v1.20

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

Hi @mortent !

Enhancements Lead here, is this still intended to graduate in 1.20?

Thanks!
Kirsten

Following up on this, are there any plans for 1.20? As an FYI, Enhancements Freeze is October 6th.

Thanks,
Kirsten

Another ping @mortent and everyone

This Issue doesn't seem to have a KEP and the link to the Design Proposal 404s. Can someone please clarify the status, whether any work will be done 1.20 and whether a KEP is forthcoming?

1.20 Enhancements Freeze is October 6th. To be included in the milestone:
The KEP must be merged in an implementable state
The KEP must have test plans
The KEP must have graduation criteria

The KEP format can be found here: See for ref https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template

Thanks
Kirsten

Enhancements Freeze is now in effect. If you wish to be included in the 1.20 Release, please submit an Exception Request as soon as possible.

Best,
Kirsten
1.20 Enhancements Lead

Was this page helpful?
0 / 5 - 0 ratings

Related issues

saschagrunert picture saschagrunert  ·  6Comments

sparciii picture sparciii  ·  13Comments

dekkagaijin picture dekkagaijin  ·  9Comments

liggitt picture liggitt  ·  7Comments

prameshj picture prameshj  ·  9Comments