Enhancements: Pod Overhead: account resources tied to the pod sandbox, but not specific containers

Created on 15 Jan 2019  ·  64Comments  ·  Source: kubernetes/enhancements

Enhancement Description

_Please to keep this description up to date. This will help the Enhancement Team track efficiently the evolution of the enhancement_

prioritimportant-longterm sinode stagbeta trackeno

Most helpful comment

All 64 comments

/help
I'm looking for someone who is interested in picking back up this proposal. Specifically, the design proposal needs to be reworked in the context of RuntimeClass, and the details need to be worked through with the sig-node community before we can move to implementation.

/cc @egernst

Thanks @tallclair. I may have questions on the process, but am happy to pick this up.

\o/
/remove-help
/assign @egernst

@tallclair: GitHub didn't allow me to assign the following users: egernst.

Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

\o/
/remove-help
/assign @egernst

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/assign @egernst

Hey @tallclair et al, I made several suggestions for the WIP RFC @ https://docs.google.com/document/d/1EJKT4gyl58-kzt2bnwkv08MIUZ6lkDpXcxkHqCvvAp4/edit?usp=sharing

I added a section for updating the _runtimeClass_ CRD, and explained how the values would be obtained from the new suggested _runtimeClass_ fields rather than configured in the _runtimeController_. The suggested _runtimeController_ scope is also greatly reduced, allowing this first iteration to just handle adding the pod overhead.

PTAL.

@tallclair @egernst I quickly went though the design proposal. We are working on In-Place Vertical Scaling for Pods, and I want to clarify a couple of things.

In the context or in-place resize, do you see PodSpec.Overhead as something VPA should be aware of via metrics reporting? And perhaps in the future, track this field and make recommendations for it. It would be a good idea to nail down early-on if this field should be mutable by an external entity.

CC: @kgolab @bskiba @schylek

Hello @tallclair, I'm the Enhancement Lead for 1.15. It looks like there is no KEP accepted yet. Is it safe to assume this will not make the enhancement freeze deadline for 1.15?

We're still planning on getting this into 1.15. Don't we still have close to 3 weeks to get the KEP in? KEP is here, btw: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/20190226-pod-overhead.md

A piece that is missing from the KEP is a discussion of how overhead interacts with pod QoS.

In my opinion, the application of RuntimeClass overhead should maintain the pod's QoS class, which I think means the overhead and requests need to match for Guaranteed pods. If they need to match for guaranteed, does it make sense to allow them to be different for burstable pods? Furthermore, pod limit is meaningless as currently defined unless all containers in the pod also have limits.

Given this, I'm wondering whether it makes sense to simplify overhead to be a single number (per resource), not differentiating between requests and limits. I.e. change the type from *ResourceRequirements to ResourceList. How the overhead is used would depend on the container requests and limits:

  • No requests or limits (BestEffort) - overhead is entirely ignored. Should RuntimeClass controller leave it out, or still set it on the pod, and have it be ignored by the scheduler & kubelet?
  • Some requests or limits (Burstable) - overhead is included with requests when scheduling
  • All containers have limits, no requests (Burstable) - overhead is included in limits set on the pod cgroup. Overhead is not included in request (0)?
  • All containers have limits (Burstable or Guaranteed) - overhead is added to total requests and limits for scheduling and limiting the pod cgroup

@egernst @derekwaynecarr @dchen1107 @bsalamat WDYT?

An alternative implementation of the above suggestion would be to keep overhead as-is on the PodSpec, and apply the logic in the RuntimeClass controller when the overhead is set. This makes the actual overhead settings explicit, but makes the admission ordering more important (e.g. for interplay between overhead and LimitRanger, or other things that manipulate requests & limits).

Hey, @tallclair @egernst I'm the v1.15 docs release shadow.

I see that you are targeting this enhancement for the 1.15 release. Does this require any new docs (or modifications)?

Just a friendly reminder we're looking for a PR against k/website (branch dev-1.15) due by Thursday, May 30th. It would be great if it's the start of the full documentation, but even a placeholder PR is acceptable. Let me know if you have any questions! 😄

Cheers! ✨

Yes, this should get docs (tagged alpha). Thanks for the reminder.

@tallclair Thank you, can you share a PR for the documentation if there is any?

Hi @egernst @tallclair . Code Freeze is Thursday, May 30th 2019 @ EOD PST. All enhancements going into the release must be code-complete, including tests, and have docs PRs open.

Please list all current k/k PRs so they can be tracked going into freeze. If the PRs aren't merged by freeze, this feature will slip for the 1.15 release cycle. Only release-blocking issues and PRs will be allowed in the milestone.

If you know this will slip, please reply back and let us know. Thanks!

Hi @egernst @tallclair, today is code freeze for the 1.15 release cycle. I do not see a reply for any k/k PRs to track for this merge. It's now being marked as At Risk in the 1.15 Enhancement Tracking Sheet. If there is no response, or you respond with PRs to track and they are not merged by EOD PST, this will be dropped from the 1.15 Milestone. After this point, only release-blocking issues and PRs will be allowed in the milestone with an exception.

The API changes are all approved now, but need a rebase based on what merged last night. Unfortunately I am flying next several hours so cannot execute this until midday PST.

Can we extend the time period for this feature add, or at least for the API changes? Is there a forma process for filing for extension? Help?

@egernst code freeze has been extended to EOD today. If the issues have LGTM labels then you are all set

@egernst we are in code freeze and this didn't make it in time. I'm going to remove it from the 1.15 list. Please file an exception if you think it still needs to be added.

/milestone clear

Hi @egernst @tallclair , I'm the 1.16 Enhancement Lead. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

Hi @kacole2 - This will be added as an alpha feature in 1.16. As I cannot edit the issue description, I'm updating details on applicable k/k PRs below:

  • kubernetes/kubernetes#76968
  • kubernetes/kubernetes#78484
  • kubernetes/kubernetes#79247
  • kubernetes/kubernetes#78319
  • (placeholder for resourceQuota changes)
  • (placeholder for CRI API change)

Hi @tallclair, I'm the v1.16 docs release shadow.

Does this enhancement require any new docs (or modifications)?

Just a friendly reminder we're looking for a PR against k/website (branch dev-1.16) due by Friday,August 23rd. It would be great if it's the start of the full documentation, but even a placeholder PR is acceptable. Let me know if you have any questions!

Thanks!

No changes in 1.16

EDIT: see below.

Whoops, I think I mixed this up with the PodSecurityPolicy issue. We are planning on adding PodOverhead as an alpha feature in v1.16.

@egernst Will you be able to handle the docs PR?

Hi @tallclair @egernst , As the deadline to open placeholder PR for docs is due by 23rd Aug.
This a friendly reminder we are awaiting for docs PR against k/website.

Thanks!

I will send a placeholder today. Is there a particular branch to push against, or just master, @VineethReddy02 ?

we're looking for a PR against k/website (branch dev-1.16). Update the PR here once it's done.

Thanks @egernst

@egernst it looks like all PRs mentioned in this thread have been merged. Code freeze for 1.16 is on Thursday 8/29. Are there any outstanding k/k PRs that still need to be merged for this to go Alpha?

Hey there @egernst @tallclair -- 1.17 Enhancements lead here. I wanted to check in and see if you think this Enhancement will be graduating to alpha/beta/stable in 1.17?

The current release schedule is:

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

If you do, please list all relevant k/k PRs in this issue so they can be tracked properly. 👍

/milestone clear

I'd love to see this reach beta in v1.17, but that depends on @egernst 's schedule (or anyone else who is interested in helping out)

Just a quick ack @tallclair @egernst with the Enhancement freeze tomorrow, do you think it should be tracked?

No changes planned for v1.17. Maybe add some e2e tests, but the feature will remain in alpha.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Hey @tallclair -- 1.18 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating to beta in 1.18 or having a major change in its current level?

The current release schedule is:

Tuesday, January 28th EOD PST - Enhancements Freeze
Thursday, March 5th, EOD PST - Code Freeze
Monday, March 16th - Docs must be completed and reviewed
Tuesday, March 24th - Kubernetes 1.18.0 Released

The KEP must also have graduation criteria and a Test Plan defined.

If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly.

Thanks!

@egernst is working on bringing the feature to beta. Eric, do you think it will make the deadlines this cycle?

Thanks @tallclair and @helayoty - this is my intention and plan.

Thanks @tallclair , @egernst for the update.

If you please list all relevant k/k PRs along with the updated KEP in this issue so they can be tracked properly, that would be great

/milestone v1.18

Hey @egernst @tallclair, sorry for the late notice on this but as we were reviewing the KEPs associated with all our tracked items, we noticed that the KEP for this doesn't include any Test Plans and the Graduation criteria is not very concerete. We're going to need to remove it from the milestone and have you submit an Exception Request

/milestone clear

@jeremyrickard @tallclair - I'll send a PR to the actual KEP to clarify graduation criteria based on prior discussions (e2e tests, monitoring/metrics) and file an exception today.

Make sense?

Sure, ping me on slack when it's ready to go, since this is time-sensitive.

Thanks @egernst and @tallclair, appreciate the effort!

@jeremyrickard -- KEP has been updated.

Can we track graduation of this feature from alpha to beta for 1.18 now? Anything more needed?

/milestone v1.18

Exception request was granted, added back to milestone.

Hello @egernst @tallclair I'm one of the v1.18 docs shadows.
Does this enhancement for (or the work planned for v1.18) require any new docs (or modifications to existing docs)? If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th., it can just be a placeholder PR at this time. Let me know if you have any questions!

Thanks @irvifa - if we 'graduate' this feature to beta, indeed we will need docs updates accordingly. I'll open a placeholder PR: https://github.com/kubernetes/website/pull/19059

@egernst Hi 👋 Thanks for your response, I've changed the tracking sheet as well..

Hey @egernst @tallclair

We are about a week out from code freeze, which is on 05 March 2020. Can you please share the k/k PRs that you have for this enhancement so we can better track the enhancement as we get close to the freeze? Thanks.

Thanks @jeremyrickard.

Details below (I can't edit description of this issue, so I've been tracking here)

/milestone clear

(removing this enhancement issue from the v1.18 milestone as the milestone is complete)

@egernst Enhancements shadow for 1.19 here. Any plans for this in 1.19?

/stage beta

I think PodOverhead GA is probably blocked on RuntimeClass GA. That means it might be an option for v1.20. @egernst are there any changes or features you think are missing before going GA? Or are we just waiting for more usage and feedback?

One piece of feedback I have heard a few times is the need for containerFixed overhead.

Ok, I am going to defer to v1.20, let me know if anything changes.

/milestone v1.20

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Hi @tallclair

Enhancements Lead here, do you still intend to go GA in 1.20?

Thanks!
Kirsten

Looks like this won't make GA in v1.20. Punting to v1.21.
/milestone v1.21

Was this page helpful?
0 / 5 - 0 ratings

Related issues

povsister picture povsister  ·  5Comments

AndiLi99 picture AndiLi99  ·  13Comments

liggitt picture liggitt  ·  7Comments

saschagrunert picture saschagrunert  ·  6Comments

andrewsykim picture andrewsykim  ·  12Comments