Containers-roadmap: [ECS] : Capacity Strategy to Fall back to OD only When No More Spot Capacity Available

Created on 26 Feb 2020  ·  8Comments  ·  Source: aws/containers-roadmap

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Ability to create a capacity strategy that allows you to use spot instances as long as the spot capacity is available, and fall back to on-demand instances only when there is no capacity available for spot.

Which service(s) is this request for?
ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I was hoping that "Base" in a capacity strategy will be more of a "strategy" but it seems to be a "constraint". In my use case, I was hoping to use 5 (which is also the total number of tasks in my service) as base for my capprovider1 which entirely consists of spot instances and use a 1:1 weight. So, the base will be met as long as there are spot instance available, otherwise I was hoping it to ignore the base and fall back to the capprovider2 which has OD instances. But even when capprovider2 has instances, service fails to place tasks because it's trying to satisfy base.

Are you currently working around this issue?
Using lambda
Please let me know if more information is required or in case there is a better alternative.

Proposed

Most helpful comment

We also observe a similar problem that I will describe below. If it sounds like a separate issue please let me know.

We run our ECS cluster with the following default providers:
FARGATE_SPOT base=0 weight=50
FARGATE base=0 weight=50

Now let's say we run a service that uses the default providers and uses autoscaling.

If the service has a desired_count=10 and the fargate_spot capacity is not available, ECS will not use the available fargate capacity to honour desired_count. The service will run with only 5 tasks instead.

I consider this almost a bug, as it is very counter intuitive that ECS will allocate by providers first and consistently ignore desired_count.
We would prefer an integrated spot/non-spot scaling approach like EC2 Fleet does.

All 8 comments

We also observe a similar problem that I will describe below. If it sounds like a separate issue please let me know.

We run our ECS cluster with the following default providers:
FARGATE_SPOT base=0 weight=50
FARGATE base=0 weight=50

Now let's say we run a service that uses the default providers and uses autoscaling.

If the service has a desired_count=10 and the fargate_spot capacity is not available, ECS will not use the available fargate capacity to honour desired_count. The service will run with only 5 tasks instead.

I consider this almost a bug, as it is very counter intuitive that ECS will allocate by providers first and consistently ignore desired_count.
We would prefer an integrated spot/non-spot scaling approach like EC2 Fleet does.

I consider this almost a bug, as it is very counter intuitive that ECS will allocate by providers first and consistently ignore desired_count.

I fully agree with this. – There should be an option to prioritize the desired count over the capacity provider. It would open a door for more flexible usage of spot capacity, also on the long-running services.

Couldn't agree more. i asked about this when SPOT was launched. Had a chat to our TAM and also the service team. Dont think it was on the agenda any time soon back then. Personally, I doubt this will be a priority for AWS as it makes SPOT just too easy and everyone will choose to use SPOT instead of FARGATE and where is the fun in that...

@dactp I'm confused, does this setting:

FARGATE_SPOT base=0 weight=50
FARGATE base=0 weight=50

Allow OD to be implemented only if SPOT is not available?

Just means it'll run 50% of tasks in Fargate and 50% in Spot, there's no failover if one is not available

+1

+1

How would one handle this with lambda?

Trigger a lambda on spot allocation failure event which does a run-task api call on the FARGATE capacity-provider?

Was this page helpful?
0 / 5 - 0 ratings