Autoscaler: Does expander=priority work for AWS ?

Created on 15 May 2019 · 16Comments · Source: kubernetes/autoscaler

My goal is that cluster autoscaler can select the cheapest instance to scale up. Unfortunately the expander=price only works for GKE and GCE for now. Therefore I am looking into priority expander, and plan to set 1-GPU EC2 instance to have the higher priority than 4-GPU instances.

But I am looking into the following doc and found out that the expander=priority is not one of the options :
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#common-notes-and-gotchas

Would you confirm the effectiveness of priority expander in AWS?

Source

xinxingliu90

All 16 comments

A follow-up question IF expander=priority work for AWS:

I have a group of labels for nodes on different node groups, for example:

node-group-1-label-A nodes have label "label_A"
node-group-2-label-B nodes have label "label_B"
node-group-3-label-A nodes have label "label_A"

My jobs have label_selector to run ONLY on group of nodes with "label_A".
If priority config map is like:

#

priorities: |-
10:
- .*node-group-1-label-A*.
50:
- .*node-group-3-label-A*.
- .*node-group-2-label-B*.

#

When my job triggers the autoscaler, does the node-group-3-label-A scales up first? And does the node-group-2-label-B never got scale up (as expected)?

Thanks!

xinxingliu90 on 15 May 2019

Kindly ping @mwielgus because I read through your answer in a similar request: https://github.com/kubernetes/autoscaler/issues/771

xinxingliu90 on 15 May 2019

@xinxingliu90 Please check with @Jeffwan (AWS maintainer). As a Google employee I may lack some context specific to AWS.

mwielgus on 15 May 2019

@xinxingliu90
Price model interface has not been implemented yet. Priority Expander is added to CA recently it's not specific to cloud providers. You can use it on AWS.

For your follow up question, you are right.

Predicate checks will run first and filer out qualifies groups
expander will choose right one based on the strategy.

node-group-2-label-B will never be picked since it fails label selector checking.
node-group-3-label-A will scale up first since it has higher priority.

Jeffwan on 16 May 2019

👍1

@Jeffwan
Thanks for your confirmation!

xinxingliu90 on 16 May 2019

@xinxingliu90 BTW, do you have a slack name? I am trying to get some GPU users and I have some proposal want to share with you later. My slack name is Jiaxin Shan. you can ping me there.

Jeffwan on 16 May 2019

@Jeffwan That will be great since you can ping me with your proposal! Our slack is internal only. You can reach me via [email protected]. Thank you!

xinxingliu90 on 16 May 2019

👍1

Hi @Jeffwan I'm adding a comment just because I'm in the exact same situation as @xinxingliu90
In particular I have an EKS cluster and I'm running several workloads including ML ones and of course I had to integrate some GPU ASGs.
Is there any news regarding price-based expansion policy? It'd be nice to help!

Luke035 on 10 Jul 2019

Hi @Jeffwan I'm adding a comment just because I'm in the exact same situation as @xinxingliu90
In particular I have an EKS cluster and I'm running several workloads including ML ones and of course I had to integrate some GPU ASGs.
Is there any news regarding price-based expansion policy? It'd be nice to help!

Does the priority strategy work for you? If you are using P2, P3, the price difference is very large. Either priority or node affinity can be used here. Could you explain your use case?

Jeffwan on 11 Jul 2019

👍1

Hi @Jeffwan,
currently I've been able to make it work with just a priority strategy. That'd be really nice when using spot instances: in my case I'm using KUBEFLOW and when Data Scientist has to spin up a notebook they don't really care about which GPU (K80, V100, etc are all fine for some experiment) and being able to choose the node based on the current price will be awesome!

Luke035 on 12 Jul 2019

@Luke035 Thanks for sharing the use case. I am curious what would you expect by using price? If we consider to spin up cheapest node, it will be always K80 (P2 instances), Any case to spin up V100 instance?

Jeffwan on 15 Jul 2019

@Luke035 you can let the EC2 Auto Scaling group choose the lowest-priced instance type for you, instead of relying on the price expander which doesn't work. just specify those different instance types in an ASG with Mixed instance policy, 100% Spot Instances, and lowest-price allocation strategy, and point CA to it.

ranshn on 13 Nov 2019

@ranshan If you use one ASG with different instance types, that's true. We talked about the problem here and it brings the challenge to simulation because the number of the node CA brings up depends on the spec of template node.

If you create multiple ASGs with single spot instance type. Price expander is still needed to make the decision. lowest-price for different instance may vary.

Jeffwan on 18 Nov 2019

@Jeffwan agreed, but if we look at the OP's use-case here:

plan to set 1-GPU EC2 instance to have the higher priority than 4-GPU instances.

they want to use the cheaper GPU instance type. So I believe that an ASG with Mixed instances policy that contains the different GPU instance possibilities (i.e p3.2xlarge, p3.8xlarge) and Spot allocation strategy of lowest-price (=1) should do the trick in this use-case, even if the Launch Template will only contain p3.2xlarge, when CA calls the scale-up API (does it take number of GPUs into account?) ASG will fulfill the cheapest available instance type, resulting in p3.2xlarge if available and p3.8xlarge if not.

ranshn on 18 Nov 2019

@ranshn If you use two override instance types in one LauchTemplate. There's no guarantee which instance will be up. price strategy is CA level concept and it doesn't have any control on ASG.

Jeffwan on 24 Nov 2019

@ranshn If you use two override instance types in one LauchTemplate. There's no guarantee which instance will be up. price strategy is CA level concept and it doesn't have any control on ASG.

exactly, it's not a guarantee because this is Spot capacity after all and I'm not asking CA to be aware of the lowest-priced instance type decision that ASG is going to make. However, since ASG will be configured with SpotAllocationStrategy = lowest-price and SpotInstancePools=1 then this should achieve what @xinxingliu90 is trying to achieve.

ranshn on 24 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings