Enhancements: Dynamic Maximum volume count

Created on 29 Mar 2018  Â·  73Comments  Â·  Source: kubernetes/enhancements

Feature Description

  • Add support for dynamic and generic mechanism of maximum volume per node.
  • Primary contact (assignee): gnufied
  • Responsible SIGs: sig-storage
  • Design proposal link (community repo): https://github.com/kubernetes/community/pull/2051
  • KEP: 20190408-volume-scheduling-limits
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) @saad-ali @jsafrane
  • Approver (likely from SIG/area to which feature belongs): @saad-ali @childsb
  • Feature target (which target equals to which milestone):

    • Alpha release target (x.y) 1.11

    • Beta release target (x.y) 1.12

    • Major Redesign 1.15

    • Stable release target (x.y)

kinfeature sistorage stagstable

All 73 comments

/assign

Is it intended for this to work with flexvolume plugins? We still lack a solution for that use-case.

It is intended to work with all volume types. Including Flexvolume and CSI.

@kubernetes/sig-storage-feature-requests

/assign @gnufied

@gnufied please fill out the appropriate line item of the
1.11 feature tracking spreadsheet
and open a placeholder docs PR against the
release-1.11 branch
by 5/25/2018 (tomorrow as I write this) if new docs or docs changes are
needed and a relevant PR has not yet been opened.

@gnufied -- What's the current status of this feature?
As we haven't heard from you with regards to some items, this feature has been moved to the Milestone risks sheet within the 1.11 Features tracking spreadsheet.

Please update the line item for this feature on the Milestone risks sheet ASAP AND ping myself and @idvoretskyi, so we can assess the feature status or we will need to officially remove it from the milestone.

The PR is still on track for 1.11. The implementation PR is here - https://github.com/kubernetes/kubernetes/pull/64154

We have approval from @saad-ali . We are waiting on approvals from @liggitt and @bsalamat . I just had to rebase the PR because the upstream changed and jordan requested some naming changes.

@gnufied -- there needs to be a Docs PR issued as well, as Misty mentioned above.
Please update the Features tracking sheet with that information, so that we can remove this feature from the Milestone risks tab.

@gnufied thanks for the update! I've moved this feature back into the main sheet.

@gnufied This feature was worked on in the previous milestone, so we'd like to check in and see if there are any plans for this to graduate stages in Kubernetes 1.12 as mentioned in your original post. This still has the alpha tag as well so we need to update it accordingly.

If there are any updates, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.


Please note that the Features Freeze is July 31st, after which any incomplete Feature issues will require an Exception request to be accepted into the milestone.

In addition, please be aware of the following relevant deadlines:

  • Docs deadline (open placeholder PRs): 8/21
  • Test case freeze: 8/28

Please make sure all PRs for features have relevant release notes included as well.

Happy shipping!

@kacole2 For 1.12 the plan is to further expand this feature to cover more volume types. Add CSI support. The decision to whether to move to beta in 1.12 or not will be taken in a day or two.

@gnufied -- just following up... are we graduating this one to Beta in 1.12?

@gnufied @saad-ali --
Feature Freeze is today. Are we planning on graduating this to Beta in Kubernetes 1.12?
If so, can you make sure everything is up-to-date, so I can include it on the 1.12 Feature tracking spreadsheet?

Yeah we are targetting this feature to move to beta for 1.12

Thanks for the update. I've added this to the 1.12 tracking sheet.

cc: @kacole2 @wadadli @robertsandoval @rajendar38

Hey there! @gnufied I'm the wrangler for the Docs this release. Is there any chance I could have you open up a docs PR against the release-1.12 branch as a placeholder? That gives us more confidence in the feature shipping in this release and gives me something to work with when we start doing reviews/edits. Thanks! If this feature does not require docs, could you please update the features tracking spreadsheet to reflect it?

@gnufied --
Any update on docs status for this feature? Are we still planning to land it for 1.12?
At this point, code freeze is upon us, and docs are due on 9/7 (2 days).
If we don't here anything back regarding this feature ASAP, we'll need to remove it from the milestone.

cc: @zparnold @jimangel @tfogo

We are moving this feature to beta. Adding support for CSI and Azure. Will open docs PR soonish.

@andyzhangx I may need your help in documenting stuff from Azure side of things.

@gnufied no problem, just assign the task to me.

@gnufied @andyzhangx -- please keep us posted. Docs PR needs to be opened ASAP. It's overdue at this point.

Hi @gnufied and @andyzhangx, do you have an update on the Docs PR? Please let us know as soon as you have a PR open.

Hi folks,
Kubernetes 1.13 is going to be a 'stable' release since the cycle is only 10 weeks. We encourage no big alpha features and only consider adding this feature if you have a high level of confidence it will make code slush by 11/09. Are there plans for this enhancement to graduate to beta/stable within the 1.13 release cycle? If not, can you please remove it from the 1.12 milestone or add it to 1.13?

We are also now encouraging that every new enhancement aligns with a KEP. If a KEP has been created, please link to it in the original post. Please take the opportunity to develop a KEP.

hi @gnufied. I'm following up on @ameukam's comment to see if there are any plans for this to graduate stages for 1.13.

This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:
Docs (open placeholder PRs): 11/8
Code Slush: 11/9
Code Freeze Begins: 11/15
Docs Complete and Reviewed: 11/27

/milestone clear

err - I deleted my earlier comment. For 1.13 - we are going to keep this feature in beta and apart from bug fixes, there won't be any change in feature itself.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

@gnufied Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP

This issue may see some internal refactoring changes but we are not sure yet. But feature will remain in beta in 1.14.

@gnufied any work here planned for 1.15? If not, we will stay in beta. Thanks

This will remain beta in 1.15

Hey all, I was just wondering if there was a good workaround for this issue? Currently working in AKS and not able to consistently startup pods because they are being assigned to nodes with no more room to attach disks.

By AKS you mean Azure container offering? Azure Disks support dynamic limits via this KEP. Which issue you are specifically referring to? May be open a github issue?

Sorry for the ambiguity @gnufied. There are several issues already open, not by me, but this is probably the most relevant: https://github.com/Azure/AKS/issues/670

Yes, I'm referring to the Azure Kubernetes Service offering. We have a cluster built of small VM's (4 disks max) for dev purposes. If I have a node with 3 disks attached and submit a new request to have k8s stand up a pod that requires 2 disks, it may assign that work to the pod with only 1 disk available. This causes the pod to crash due to not being able to fulfill the number of disks I need and I am never able to successfully get that pod up.

As far as I understand it, it's all related to the primary bullet for the issue on this thread: Add support for dynamic and generic mechanism of maximum volume per node.

I will gladly open a new issue if that is a better forum to discuss what I'm seeing and potential workarounds, but I'm not sure if that's on this repo or one for AKS since both are in the mix.

Thanks for your help.

@ammills01 which version of k8s? Dynamic volume limits for Azure Disks was fixed awhile back(1.12) - https://github.com/kubernetes/kubernetes/pull/67772 and should correctly take into account number of volumes that can be attached to a node.

For small VMs, I wonder this is a problem for root disk consuming one disk and hence k8s incorrectly assuming that node still has 2 disks. Can you open a github issue? cc @andyzhangx does Azure needs logic to reserve one disk for root disk?

k8s v1.12 should already have Dynamic volume limits for Azure Disks(that only account for data disk), while @ammills01 's question is if there is 3 data disks already, and one pod need 2 data disks, will it mount one data disk first, and then when mount the second disk, k8s volume scheduler found it's already hit max disk num? Or volume scheduler will first check whether the node could mount 2 disks in the beginning? I am not sure about the logic here.

k8s v1.12 should already have Dynamic volume limits for Azure Disks(that only account for data disk), while @ammills01 's question is if there is 3 data disks already, and one pod need 2 data disks, will it mount one data disk first, and then when mount the second disk, k8s volume scheduler found it's already hit max disk num? Or volume scheduler will first check whether the node could mount 2 disks in the beginning? I am not sure about the logic here.

That is exactly my question @andyzhangx. I'm typing up a new Git issue now that explains what I'm using when it comes to VM's for my AKS cluster, AKS version, the files I'm using for k8s deployment, what I see when I query the nodes (kubectl get nodes -o json), the error I see, etc...

According to the Azure portal, this is using AKS 1.13.5, so it should have whatever fix was in for 1.12. Do you know if we need to have --feature-gates=CSINodeInfo enabled or something for this to pick up the right values? When I query a node and spit out the JSON, I see 2 values for 'attachable-volumes-azure-disk'. One value is under 'allocatable', the other under 'capacity'. In my case, both say 4. I'm not sure why, but I expected capacity to show the maximum for the VM, not taking into account what is currently allocated, and allocatable to indicate what is left to be allocated, taking into account the number of disks already in use.

@andyzhangx no the scheduler will look for "viability" of both volumes in a pod at once, not one at a time.

Even though this feature is remaining as beta, it's going to go under significant redesign. The KEP is here: https://github.com/kubernetes/enhancements/pull/942

/milestone v1.15

Hey @gnufied Just a friendly reminder we're looking for a PR against k/website (branch dev-1.15) due by Thursday, May 30. It would be great if it's the start of the full documentation, but even a placeholder PR is acceptable. Let me know if you have any questions!

@gnufied The placeholder PR is due Thursday May 30th.

Hi @gnufied @msau42 @saad-ali . Code Freeze is Thursday, May 30th 2019 @ EOD PST. All enhancements going into the release must be code-complete, including tests, and have docs PRs open.

Please list all current k/k PRs so they can be tracked going into freeze. If the PRs aren't merged by freeze, this feature will slip for the 1.15 release cycle. Only release-blocking issues and PRs will be allowed in the milestone.

If you know this will slip, please reply back and let us know. Thanks!

Hi @gnufied @msau42 @saad-ali, today is code freeze for the 1.15 release cycle. I do not see a reply for any k/k PRs to track for this merge. It's now being marked as At Risk in the 1.15 Enhancement Tracking Sheet. If there is no response, or you respond with PRs to track and they are not merged by EOD PST, this will be dropped from the 1.15 Milestone. After this point, only release-blocking issues and PRs will be allowed in the milestone with an exception.

/milestone clear

This enhancement is going to miss 1.15 release window. There is a PR open but it wasn't merged during the given timeline - https://github.com/kubernetes/kubernetes/pull/77595

/milestone clear

Further update - we have requested feature exception for this. The feature exception is still being discussed and needs approval from sig-scheduling folks. cc @bsalamat @ravisantoshgudimetla

/milestone v1.15

/milestone clear

This feature will remain in beta in 1.16

We recently discovered that this is even more dynamic than it used to be, because ENIs now count against volume attachments on some AWS instance types: https://github.com/kubernetes/kubernetes/issues/80967

Hello @gnufied -- 1.17 Enhancement Shadow here! 🙂

I wanted to reach out to see if this enhancement will be graduating to stable in 1.17?


Please let me know so that this enhancement can be added to 1.17 tracking sheet.

Thank you!

🔔Friendly Reminder

The current release schedule is

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

/assign

Hi @kcmartin; we intend to graduate this feature to GA in v1.17.

Thanks @bertinatto !
/milestone v1.17

/stage stable

Hello @gnufied I'm one of the v1.17 docs shadows.
Does this enhancement for (or the work planned for v1.17) require any new docs (or modifications to existing docs)? If not, can you please update the 1.17 Enhancement Tracker Sheet (or let me know and I'll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.17) due by Friday, November 8th, it can just be a placeholder PR at this time. Let me know if you have any questions!

@gnufied

Since we're approaching Docs placeholder PR deadline on Nov 8th. Please try to get one in against k/website dev-1.17 branch.

@irvifa: created placeholder PR here: https://github.com/kubernetes/website/pull/17432

Hi @gnufied
I am one of the Enhancements Shadows for the 1.17 Release Team. We are very near to Code Freeze (Nov 14th) for this release cycle. Just checking in about the progress of this enhancement. I see that https://github.com/kubernetes/kubernetes/pull/77595 was filed in relation to this.

Are there any other PRs related to this enhancement? If yes, can you please link them here?

Thank you in advance 😄

Hi @kcmartin.
These are the open PRs related to this feature:
Docs: https://github.com/kubernetes/website/pull/17432
Kubernetes: https://github.com/kubernetes/kubernetes/pull/83568

Thank you @bertinatto !

Hey @gnufied @bertinatto , Happy New Year! 1.18 Enhancements lead here 👋 Thanks for getting this across the line in 1.17!!

I'm going though and doing some cleanup for the milestone and checking on things that graduated in the last release. Since this graduated to GA in 1.17, I'd like to close this issue out but the KEP is still marked as implementable. Could you submit a PR to update the KEP to implemented and then we can close this issue out?

Thanks so much!

Hey @gnufied @bertinatto , Happy New Year! 1.18 Enhancements lead here wave Thanks for getting this across the line in 1.17!!

I'm going though and doing some cleanup for the milestone and checking on things that graduated in the last release. Since this graduated to GA in 1.17, I'd like to close this issue out but the KEP is still marked as implementable. Could you submit a PR to update the KEP to implemented and then we can close this issue out?

Thanks so much!

Hi @jeremyrickard,

I just created https://github.com/kubernetes/enhancements/pull/1433 to update the KEP status.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Hi @bertinatto, thank you so much for updating the status. :)

Closing this enhancement issue since the KEP has been implemented.

/close

@palnabarun: Closing this issue.

In response to this:

Closing this enhancement issue since the KEP has been implemented.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

povsister picture povsister  Â·  5Comments

justaugustus picture justaugustus  Â·  7Comments

robscott picture robscott  Â·  11Comments

wlan0 picture wlan0  Â·  9Comments

saschagrunert picture saschagrunert  Â·  6Comments