Tell us about your request
Please improve scaling on Managed Auto Scaling in Multi-AZ with Amazon ECS Cluster Auto Scaling (CAS). Current scaling behavior is conservative and not ideal.
Which service(s) is this request for?
ECS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
From the deep dive article on Amazon ECS Cluster Auto Scaling (CAS). Calculation of the value of M (i.e. the number of instances to add to the Auto Scaling Group(ASG)) is not ideal for multi-AZ as it determines how CAS control the ASG.
The formula for calculation of M for multi-AZ is:
M = N + minimumScalingStepSize
Where:\
M = Number of required container instances \
N = Number of running container instances
Let's assume:
1) Capacity Provider with Managed Autoscaling is configured with multi-AZ, and has the default minimumScalingStepSize and maximumScalingStepSize.
"managedScaling": {
"status": "ENABLED",
"targetCapacity": 100,
"maximumScalingStepSize": 10000,
"minimumScalingStepSize": 1
}
2) A container instance can only fit one task.
3) Initially one task is RUNNING
4) Six additional tasks are now added (i.e. six task will be in PROVISIONING state).
In theory, we can expect six additional container instances to be added. However with the above assumptions and formula, we can see that M only increases 1 at a time.
i.e. \
M = 1 + 1 = 2 \
M = 2 + 1 = 3 \
etc...
The incremental increase of M subsequently impacts the calculation fo CapacityProviderReservation (CPR), which controls the scaling of the ASG. Time taken for all 6 task to be RUNNING can take as long as 20 minutes, which is not ideal.
It is true that ajusting minimumScalingStepSize can allow for a more aggressive scale up, but does incur addtional cost because additional instances will be idelling.
Another alternative approach is to have multiple Capacity Providers and ASG in one availability zone each, such that predecitve scaling can be used instead, however that doesn't meet our capacity needs at the moment.
We would like to see improvements made to the calcuation of M and CPR in a multi-AZ ASG.
Are you currently working around this issue?
No, currently testing CAS with multi-AZ ASG.
Hi! This is now resolved with this release: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-ecs-cluster-auto-scaling-now-offers-more-responsive-scaling/
Most helpful comment
Hi! This is now resolved with this release: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-ecs-cluster-auto-scaling-now-offers-more-responsive-scaling/