Autoscaler: Cluster-autoscaler support for OpenStack cloud provider

Created on 21 Mar 2018 · 26Comments · Source: kubernetes/autoscaler

There are numerous potential ways to support cluster autoscaling in OpenStack. Services like Heat and Senlin are not guaranteed to be support in an OpenStack deployment. So base support using only core OpenStack services should be targeted. The eventual path could be to optionally support using other services, but that would be outside the scope of this issue, other than driving the implementation to be abstract around ties to underlying services.

I found #230, but it has been closed as stale.

cluster-autoscaler lifecyclrotten

Source

dklyle

👍17 ❤1

Most helpful comment

Might be a little revolutionary here, but why not use the logic from cluster-api-provider-openstack(CAP-OS) instead of reimplementing? Anything we want to support in autoscaler we also want to support in CAP-OS, and we can prevent duplication of the logic. Also, the long-term intent is that cluster-api would directly be responsible for the creation and destruction of machines, and the autoscaler would be cloud-agnostic.

chaosaffe on 25 Oct 2018

👍4

All 26 comments

@dklyle

We are also struggling with cluster auto-scalar. Our requirement is to provdeprov nodes (VMs) automatically based on the usage of the resources on each of the node.

We are planning to have a "VM Farm" having n number of VMs which can be utilized by any k8s cluster (in our on-prim env) looking for a node. This VM farm would be maintainedmanaged by OpenStack.

Currently we do not find any implementationsolution where K8s CA is able to communicate properly with OpenStack and is able to do node provdeprov automatically.

Could you please advise by when you would be able to release the solution for the same?

Thanks,
Varun T

varuntalus on 27 Mar 2018

@varuntalus This is very much a work in progress. I'm working on code in spurts. I will update here when I have enough progress to be useful. But I also wanted to solicit input of use cases and any discussion of desired criteria to help shape the implementation.

dklyle on 3 Apr 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 2 Jul 2018

/remove-lifecycle stale

ricolin on 18 Jul 2018

We are very interested in getting this to work. @dklyle would you be able to provide an update for this? Anything we can do to help you move forward with this?

timstoop on 22 Aug 2018

could you guys implement this to kops as well :)

zetaab on 23 Aug 2018

@zetaab appears I have to fix gophercloud as well (already some fix merged,but needs more), will also implement in kops once we get full test here.

ricolin on 6 Sep 2018

@timstoop Since this has been pending for long time, I will take over if that's fine with all. Any help or comments are welcome

ricolin on 6 Sep 2018

👍3

Current patch still missing Test, and some implementation, I need to fix some issue in gophercloud (OpenStack Golang SDK) first before I can push all implement and tests up.

ricolin on 6 Sep 2018

I'm no Go programmer, but let me know if and how I can help!

timstoop on 7 Sep 2018

👍1

After we discuss in OpenStack PTG [1]. It's time we start to do something as next steps. I'm working on 2(build a common lib) now to build common lib (which I believe we still need to keep adding stuff as we try to implement it in three version). And after that, I assume I gonna help on 4(add heat support). So if you find anything you would like to help, feel free to leave some message. Also any idea, suggestions, or reviews are more than welcome!

check gophercloud support for Nova, and Heat, as this is a good chance to help Gophercloud to make sure everything is up to date. IMO we need to think about having a fix for something like [2] before or after we keep this moving.
in autoscaler, add common lib for OpenStack provider. As we discussed to have only support Heat now, might not be the best way. We should build up a lib (and maybe consider to move it to OpenStack provider repo later), then we can do implementation in Nova, Heat, Senlin later. I'm also working on integrate Heat and Senlin together in some level as a long-term plan [3] for both team, but that's another story I believe.
add Nova support for autoscaler (include document)
add Heat support for autoscaler (include document)
add Senlin support for autoscaler (include document)
add magnum support for autoscaling. Well, this will be another plan, but just bring it up and we might get more attention or volunteers.

[1] https://etherpad.openstack.org/p/sig-k8s-2018-denver-ptg
[2] https://github.com/gophercloud/gophercloud/issues/1157
[3] https://etherpad.openstack.org/p/autoscaling-integration-and-feedback

ricolin on 22 Oct 2018

👍2

@ricolin I am also interested in implementing this( As discussed on slack), let me know how I can help.

Rajat-0 on 22 Oct 2018

👍1

chaosaffe on 25 Oct 2018

👍4

@ricolin why support Nova, Heat, Senlin of openstack projects for cluster autoscaler(CA), I think CA is used to adjusts the k8s nodes by using openstack cloud, the initiator is k8s, if k8s need more nodes, it should notify openstack to create more nodes and then the nodes join k8s, also about reduce nodes.

adsl123gg on 8 Nov 2018

CA adds VMs on cloudprovider side (for example it resizes GCP MIG or AWS ASG). Basically it needs to be able to ask cloudprovider to either add new node or remove existing one.

MaciekPytel on 8 Nov 2018

@adsl123gg I guess the reason to add OpenStack will be the same when added AWS, GCP, etc. Which @MaciekPytel just mentioned

ricolin on 8 Nov 2018

@ricolin the Cluster Autoscaler is only responsible for managing compute (i.e. Nova) resources. All other resources are managed by the Cloud Provider Openstack, where contributions would be greatly appreciated

chaosaffe on 8 Nov 2018

@chaosaffe currently we're building a common way (which I will push to git hub soon) to treat all of them the same as libraries which we can move to cloud-provider-openstack later.

Just to clarify
Cluster Autoscaler is managing cluster like ASG in AWS which equivalent to ASG in OpenStack Heat, or Cluster in OpenStack Senlin. In that sense, I don't think it's fair to say Nova is what Cluster Autoscaler only responsible. We still will leave room to implement Nova as one of backend anyway. IMO all library (OpenStack resources) should move to OpenStack provider, and which is what I'm intended (I guess that's what you ask for too)

I like the idea to add them into cluster API, but there are still some more things need to be added even after this work in Autoscaler is done. Will try to help on that too.

ricolin on 8 Nov 2018

@chaosaffe cloud-provider-openstack is based on k8s position, if the service/pod of k8s need LoadBalance/persistentVolume, k8s will ask the cloud provider to create the corresponding resources, k8s is initiator. For Cluster Autoscaler(CA), I think is seem like a component in openstack, the initiator is CA, CA scale k8s nodes according k8s resource utilizing status, do I understand correct @ricolin. So for auto scale k8s cluster, there are two views to implement, from k8s position or cloud provider position, @ricolin do you think which is better and what's their advantages and disadvantages?

adsl123gg on 11 Nov 2018

@ricolin I know CA should support openstack, but I want to know how? from your comments I don't know how to auto scale k8s cluster, you only mention Nova, heat or other components, which confuse me with the FAQ in Cluster Autoscaler, so could you explain your design of how to finish autoscale in k8s cluster with openstack?
according to the README in Cluster Autoscaler on Azure/AWS/AliCloud , they scales worker nodes, and from the architecture diagram, CA is running in k8s cluster, so I think the design is different with other projects in cluster-autoscaler.

adsl123gg on 17 Nov 2018

@adsl123gg fear not my friend, the design is actually very simple, is the same structure as other Provider (Azure, AWS, etc). The difference is, instead of only support single method to talk to Provider, we tend to allow support multiple ways in the future. As for those OpenStack components support here, the only different is the implementation for some fundamental function like FetchASGTargetSize, or DeleteInstances etc.

ricolin on 27 Nov 2018

@ricolin, Thanks for working on this.

Wouldn't Nova be pretty straightforward to implement? I'd imagine some value would need to be passed to enable joining the cluster perhaps via kubeadm...via cloudinit and new node metadata?

Can you please provide more details of the implementation?

ElanHasson on 3 Feb 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 May 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 3 Jun 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 3 Jul 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.