Containers-roadmap: [ECS] Full support for Capacity Providers in CloudFormation.

Created on 5 Dec 2019  ·  89Comments  ·  Source: aws/containers-roadmap

CloudFormation does not currently have support for capacity providers in any of the ECS resource types. We will be adding this support in the near future.

ECS Work in Progress

Most helpful comment

Any ETA on this?

All 89 comments

Related to this, in order to support capacity providers with managedTerminationProtection, we also need to be able to set the new-instances-protected-from-scale-in property when creating the ASG via CloudFormation. This latter property was added 4 years ago to the AWS SDK / AWS CLI, but is still not supported in CF -- hopefully full support for CP in CF is added a bit faster.

Has there been any progress made on this?

Add support for Capacity providers #1

We are working on it and will provide updates as soon as more information is available.

Related to this, in order to support capacity providers with managedTerminationProtection, we also need to be able to set the new-instances-protected-from-scale-in property when creating the ASG via CloudFormation. This latter property was added 4 years ago to the AWS SDK / AWS CLI, but is still not supported in CF -- hopefully full support for CP in CF is added a bit faster.

Additionally, when the new-instances-protected-from-scale-in property is set on ASG, scheduled action to scale-in instances could not be executed. Feature like force-scale-in for scheduled actions would be useful if for example we have dev env and we would like to turn off instances for night and turn them back on in the morning.

+1

When this is implemented, will it be possible to do a rolling update to the launch template under autoscaling and a change to a service in ecs, such that the new tasks run on instances from the new launch template while the old ones stay on the old instances as they roll over?

I'm struggling to achieve this with custom resources at the moment, partly as the dependencies are all in funny directions. Would be great to have it all defined declaratively in cfn.

Any ETA on this?

Does this depend on #632?

Does this depend on #632?

I think no.

Sadly, that's the reason why using CloudFormation is becoming more and more frustrating.

FWIW, Terraform has supported this since shortly after the API was released: https://github.com/terraform-providers/terraform-provider-aws/pull/11151

Of course, it can't delete capacity providers since there's no API:
https://www.terraform.io/docs/providers/aws/r/ecs_capacity_provider.html

I don't want to use, rely on and support third-party software if I have a chance to use the official product.

any update?

same here, any updates?

any update?

the lack of Cfn support for this 6 months in is really disappointing. This puts the burden on anyone building CI/CD using Cfn to add additional and silly custom cli/sdk pieces to actually tie in capacity providers, which then have to be ripped out once the support that should be part of a point release is in place.
You can do better. Communicating timeframes would help as well.

Have you had a deeper look into Capacity Providers and Cluster Auto Scaling? Does not match with my requirements at all. Does not scale down properly. Does not work with CloudFormation rolling updates for the ASG. So missing CloudFormation support is not the only problem here. :)

Have you had a deeper look into Capacity Providers and Cluster Auto Scaling? Does not match with my requirements at all. Does not scale down properly. Does not work with CloudFormation rolling updates for the ASG. So missing CloudFormation support is not the only problem here. :)

Thanks for the feedback - can you explain more what you mean by "does not scale down properly"?

coultn: Here's what I think is a common use case: A CI/CD pipeline where services are spun up on an ASG backed EC2 cluster.
Services do not pre-exists, the CI/CD creates them.
Currently, you can not use cfn to create a capacity provider enabled service.
If the underlying cluster doesn't have the memory or cpu, I would expect that when a new service is deployed, it would add another ec2 and deploy the new service..but there's no way to do that currently. I suppose what might work right now is: Deploy the service with no capacity provider, perhaps with a quantity of 0 so it stabilizes, then via the cli, update the service to use a capacity provider, then another cli call to increase the quantity to 1....but that seems like hoop jumps.
With regards to down scaling, in reading the documentation, it seems a bit unclear on exactly how this is meant to work: If the goal is to optimize resources, I would actually want the cp to be intelligent enough to a) determine that the cluster is currently overprovisioned and b) if so, drain EC2 accordingly and have the ASG terminate the drained instance...all with standard, appropriate cooldown periods, etc.

Currently, you can not use cfn to create a capacity provider enabled service.

Thanks for the feedback! We are working on full support for capacity providers in CloudFormation, and we definitely understand the need for that. However, I do want to point out that you can actually create a capacity-provider enabled service in CloudFormation today. You can accomplish this by first configuring a default capacity provider strategy for the cluster. This default capacity provider strategy will be used by any service you create that does not specify a launch type. Next, when you create your service in CloudFormation, do not include the LaunchType parameter. The service will use the capacity provider strategy defined by the cluster, and will auto-scale from zero instances if necessary.

With regards to down scaling, in reading the documentation, it seems a bit unclear on exactly how this is meant to work: If the goal is to optimize resources, I would actually want the cp to be intelligent enough to a) determine that the cluster is currently overprovisioned and b) if so, drain EC2 accordingly and have the ASG terminate the drained instance...all with standard, appropriate cooldown periods, etc.

Understood. In the first version of ECS cluster auto scaling, we took a more conservative route where instances would not scale in unless no tasks are running on them. We are looking at the idea of automating an "instance drainer" that will automatically find underutilized instances and set them to draining. With ECS cluster auto scaling, those instances would automatically shut down once no tasks are running on them. It's possible to do this already today, but you would need to implement your own Lambda function (or similar) to do the evaluation of the instance and call the ECS API to set the instance to the DRAINING state.

Really awesome feedback, thank you. As far as the workaround for setting it at Cluster creation, I'll take a look at that..easy enough to implement for QA/Dev..a little trickier for existing prod environments.

Trying to avoid custom tooling since...this seems sooo close to being a solid solution.

Any timing on better cfn support? I know that's a different, probably very overwhelmed team, but would be nice to see some improvements here. ECS rocks, and once this is dialed in, it's going to really round out the offering.

Will keep checking for ECS updates!

Dear colleagues,
Please, in CF, provide the opportunity of fine tune some Capacity Provider auto generated parameters. Currently, in addition the the current parameters, we need the adjust the Cooldown in the Auto Scaling Plan manually, as well the Alarms datapoints, all after the Capacity Provider creation. It would be great put all this together in the CF script. This is a must for us. Thank you very much!

Regarding timeline - we can't share specific timelines but we will share updates here as soon as they are available.

coultn:
Because this is such a useful feature for so many of my clients, I decided to re-tool things today.
Unfortunately, capacity providers still doesn't seem to work.
The cluster default cp is in place.
I re-created services without the LaunchTemplate reference, and it clearly shows the services are using the capacity provider strategy.
However, when I deploy services and exhaust the memory, it throws the usual message saying it can't find a container with the resources.
Interestingly, and probably to the point: The cloudwatch metric for the cp that is assigned to this cluster (CapacityProviderReservation) isn't reporting any metrics at all.
I have seen this metric chart more appropriately in previous tests a few weeks ago with another client...no idea why it's not reporting anything. I spun up about 5-8 services today on this cluster using the cp strategy....
I'll just keep checking back for updates...hopefully some good changes coming soon.

+1

This is definitely a showstopper for our CDK-powered automation workflows. Setting Capacity Provider on a cluster level is something CloudFormation team is looking into. https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap/issues/301

In the meantime our workaround is to run following aws-cli command in our ci/cd workflow:

aws ecs put-cluster-capacity-providers \
    --cluster CLUSTER_NAME \ 
    --capacity-providers FARGATE \ 
    --default-capacity-provider-strategy capacityProvider=FARGATE

I really hope this ships soon. 🤞

+2

Deletion is now supported by the API. Will this accelerate the implementation of this feature addition?

https://aws.amazon.com/jp/about-aws/whats-new/2020/06/amazon-ecs-capacity-providers-support-delete-functionality/

+1

Saw this earlier today, but the resources don’t seem to have been updated yet: https://twitter.com/aws_doc/status/1273943424849383424?s=21

I have implemented the new CloudFormation resources in one of my stacks and can confirm it works 👍

there's still a missing link though which might be (part of) the reason why it was not announced yet:

AWS::ECS::CapacityProvider AutoScalingGroupProvider requires the parameter AutoScalingGroupArn which accepts only an ARN (which contains a UUID part so you cannot "guess" it).

Unfortunately AWS::AutoScaling::AutoScalingGroup does not expose its ARN so there's no way to reference this in the AutoScalingGroupProvider for now.

Either hardcoding an existing ARN or, once more, hacking around with a Custom Resource to get the ARN works.

ah AWS, where just the C is an acceptable MVP for CRUD. oh well glad it's finally getting released.

I have implemented the new CloudFormation resources in one of my stacks and can confirm it works

there's still a missing link though which might be (part of) the reason why it was not announced yet:

AWS::ECS::CapacityProvider AutoScalingGroupProvider requires the parameter AutoScalingGroupArn which accepts only an ARN (which contains a UUID part so you cannot "guess" it).

Unfortunately AWS::AutoScaling::AutoScalingGroup does not expose its ARN so there's no way to reference this in the AutoScalingGroupProvider for now.

Either hardcoding an existing ARN or, once more, hacking around with a Custom Resource to get the ARN works.

What about the Termination protection on Autoscaling and managed termination on CapacityProvider? I believe Autoscaling resource needs to be updated to support that.

A typical scenario, of having a template with an ASG and an capacity provider defined in the same template (which rhlarora84 alluded to) is not possible because the AWS::AutoScaling::AutoScalingGroup resource only returns the name..but capacity provider reqiures an Arn...That's kind of a miss on the ASG resource as well (why does it not have an Arn attribute...?).
At the least, it would be nice if the capacity provider can specify the name or arn as an option. A number of other resources support that.

@coultn Hello, Is there a way or how are we going to cover the need of doing the ASG rolling update for AMI refresh or something that sort with having the capacity provider with managed termination. At present the pack (ECS Cluster, Capacity provider, ASG, Cloud formation) does not support the rolling update since the termination protection of ASG should be on for the managed termination of CP to work so we are sacrificing the managed termination of CP over the rolling update for now. It would be great if it can accommodate all these.

@manokaran3529 be careful, on scale down we saw container instances being terminated with managed termination protection off, when there was a better choice available (instance not running any container). You raise a good point regarding the rolling update though, I'm intending on using that and haven't tested yet...

How to manage circular dependency ?

ECS Cluster _needs_ Capacity Provider
Capacity Provider needs ASG (because of Arn)

When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.

Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.

Hello! We are actively working on a few things to provide more comprehensive capacity provider support in CloudFormation.

  1. Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
  2. Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
  3. Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource

Hello! We are actively working on a few things to provide more comprehensive capacity provider support in CloudFormation.

  1. Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
  2. Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
  3. Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource

ETA?

@manokaran3529 be careful, on scale down we saw container instances being terminated with managed termination protection off, when there was a better choice available (instance not running any container). You raise a good point regarding the rolling update though, I'm intending on using that and haven't tested yet...

Yes, it terminated an instance which had most of the tasks. As a hack, we changed the termination policy of the ASG to 'Newest' so while termination it picked the newest one where we had only the scaled up tasks.

Capacity Provider for Cloudformation is now available: https://d201a2mn26r7lk.cloudfront.net/latest/gzip/CloudFormationResourceSpecification.json

(or more friendly changelog: https://github.com/aws/aws-cdk/commit/4ce27f4195c70bd9e365ec0e0df5c0ede863bc8a)

Capacity Provider for Cloudformation is now available: https://d201a2mn26r7lk.cloudfront.net/latest/gzip/CloudFormationResourceSpecification.json

(or more friendly changelog: aws/aws-cdk@4ce27f4)

What does this mean ? This is old news here, looks same thing to me , still everyday look around here for fixes are done or not.

Sorry, i missed that this was released 12 days ago. Will wait for the fixes above.

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.

Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?

Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

Indeed, documentation has been updated to "The Amazon Resource Name (ARN) or short name that identifies the Auto Scaling group."

Hi All, confirming that

  1. Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
    is now available in all regions.

@anoopkapoor any eta on 3? Scale in protection on Autoscaling.

@anoopkapoor any eta on 2)Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource?

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.

Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?

Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

How do you manage circular dependency still ?

ECS Cluster needs Capacity Provider
Capacity Provider needs ASG (because of Ref)

When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.

Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.

How do you manage circular dependency still ?

ECS Cluster needs Capacity Provider
Capacity Provider needs ASG (because of Ref)

When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.

Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.

To be honest, I didn't manage it then deletion. Now, we can either leave this as is and just deal with it like the "non-empty-bucket" for AWS::S3::Bucket or we can add logic into the Delete Workflow for AWS::ECS::Cluster to drain all existing services and tasks.

I'm a fan of the second approach because unlike an AWS::S3::Bucket we're typically not having permanent data loss if the cluster is deleted. Unfortunately it'll take time to get a tear down process that works for most circumstances.

I can probably make a generic custom resource/resource provider that can be used for clean-up if you're running ephemeral workloads and need a solution to this right now?

With out specifying a capacity provider on a service, is it actually possible to deploy a ECS Service into a Fargate-only ECS cluster (with the out-of-the-box capacity providers + a default capacity provider set?)

Would assume it would "just work", but in the current implementation we're seeing errors along the lines of:

There are no capacity providers in the capacity provider strategy with a weight value greater than zero. Specify a weight value greater than zero for at least one capacity provider and try again.

I presume adding the ability to set them (with weighting) on the service (similar to the command line) fixes that, but from my understanding it appears it's impossible to do that at the moment?

With out specifying a capacity provider on a service, is it actually possible to deploy a ECS Service into a Fargate-only ECS cluster (with the out-of-the-box capacity providers + a default capacity provider set?)

Would assume it would "just work", but in the current implementation we're seeing errors along the lines of:

There are no capacity providers in the capacity provider strategy with a weight value greater than zero. Specify a weight value greater than zero for at least one capacity provider and try again.

I presume adding the ability to set them (with weighting) on the service (similar to the command line) fixes that, but from my understanding it appears it's impossible to do that at the moment?

How did you define the ECS Cluster to make a Fargate only cluster? Can you either describe it using DescribeClusters or provide the CloudFormation template snippet you used?

@taylorb-syd

We haven't updated to do the new support for capacity providers in CFN, but the steps we followed to make the cluster (and have been running tasks fine in, without specifying anything).

Cluster itself is just a raw AWS::ECS::Cluster in cloudformation (tags set, nothing else), and after
creation we ran:

aws ecs put-cluster-capacity-providers \
          --cluster "my-cluster-name" \
         --capacity-providers FARGATE FARGATE_SPOT \
         --default-capacity-provider-strategy  capacityProvider=FARGATE_SPOT \
         --region ap-southeast-2

If that makes sense?

So there are no ECS instances / other capacity providers, just the above configuration - and attempting to deploy a service into that causes the given issue.

I think the root of the issue is a service ignores the default-capacity-provider-strategy

Personally, in CFn, I used

ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      CapacityProviders:
        - FARGATE_SPOT
      DefaultCapacityProviderStrategy:
        - Base: 0
          CapacityProvider: FARGATE_SPOT
          Weight: 1

And my ECS is pure FARGATE SPOT.

And before the CFn support, I did exactly like Sutto above.

Ah, interesting - odd that it works without that there, sounds like I've just gone down the wrong path (Thanks @PierreKiwi) - The above statement previously created it (a few months ago), with no weight set - and that value was hidden via the UI.

Turns out the default configuration above ignores giving it a weight, which seems counter intuitive - the CFN version is much more explicit, and much nicer.

To be perfectly correct, actually before CFn support, I was doing this

aws ecs put-cluster-capacity-providers \
    --cluster <CLUSTER_NAME> \
    --capacity-providers FARGATE_SPOT \
    --default-capacity-provider-strategy capacityProvider=FARGATE_SPOT,weight=1,base=0 \
    --region <REGION>

So it was an easy translation to CFn :)

Ah, interesting - odd that it works without that there, sounds like I've just gone down the wrong path (Thanks @PierreKiwi)

Turns out the default configuration above ignores giving it a weight, which seems counter intuitive - the CFN version is much more explicit, and much nicer.

Ahh yes, it defaults to weight : 0, which I think is not desirable. I'll report this up the chain internally. For now, make sure you set a non-zero weight if you're only specifying one capacity provider.

▶ aws ecs create-cluster --cluster-name testing --capacity-providers FARGATE FARGATE_SPOT --default-capacity-provider-strategy capacityProvider=FARGATE_SPOT
{
    "cluster": {
        "clusterArn": "arn:aws:ecs:ap-southeast-2:<redacted>:cluster/testing",
        "clusterName": "testing",
        "status": "PROVISIONING",
        "registeredContainerInstancesCount": 0,
        "runningTasksCount": 0,
        "pendingTasksCount": 0,
        "activeServicesCount": 0,
        "statistics": [],
        "tags": [],
        "settings": [
            {
                "name": "containerInsights",
                "value": "enabled"
            }
        ],
        "capacityProviders": [
            "FARGATE",
            "FARGATE_SPOT"
        ],
        "defaultCapacityProviderStrategy": [
            {
                "capacityProvider": "FARGATE_SPOT",
                "weight": 0,
                "base": 0
            }
        ],
        "attachmentsStatus": "UPDATE_IN_PROGRESS"
    }
}

@taylorb-syd @PierreKiwi thanks for the help with this - that wasn't a fun one to try and work around. Might also be worth pushing up the chain in the ECS team to show the weight when looking in the console - at the moment, it's hidden until you click edit / add new, and there appears to be no way to modify the weight of the existing (short of adding + removing in the same request).

@taylorb-syd @PierreKiwi thanks for the help with this - that wasn't a fun one to try and work around. Might also be worth pushing up the chain in the ECS team to show the weight when looking in the console - at the moment, it's hidden until you click edit / add new, and there appears to be no way to modify the weight of the existing (short of adding + removing in the same request).

I have noted this in my internal request. Thanks.

Is launching a task in fargate spot using cloudformation supported at the moment? We are trying to do this, and it seems like there is some initial support for capacity providers, but not for attaching tasks to those capacity providers.

You don't need to do anything special to attach tasks to the capacity providers. If you have this line in your Service definition, you'll want to remove it:

LaunchType: FARGATE

Then you'll see the Capacity provider listed for your task:

Task definition console

You don't need to do anything special to attach tasks to the capacity providers. If you have this line in your Service definition, you'll want to remove it:

LaunchType: FARGATE

Then you'll see the Capacity provider listed for your task:

Task definition console

The assumption here is that a DCPS has been set on your cluster similar to this:

ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      CapacityProviders:
        - FARGATE_SPOT
      DefaultCapacityProviderStrategy:
        - Base: 0
          CapacityProvider: FARGATE_SPOT
          Weight: 1

Without a DCPS, removing the LaunchType will default to attempting to launch the task under EC2 instead of FARGATE_SPOT. Other than that @adamkeim-pwr it should work as expected.

If I have a cluster with FARGATE and FARGATE_SPOT as capacity providers, how can I select which capacity provider the task is then using them? I know I can do weight, but I would like to be able to select on a task level which capacity provider is used. Do I just need to use two clusters?

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.

Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?

Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

This is still not going to work. In a non-trivial deployment of ECS you will be wanting to pass the ECS Cluster ID into the launch configuration/template so that the agent knows which cluster to join. This means that the dependency chain goes like this:

ECSCluster -> Capacity Provider -> Autoscaling Group -> Launch Config -> ECSCluster

What is needed is a resource to manage the attachment of the Capacity provider to the ECSCluster and break the dependency loop. If that is not provided by AWS you are going to need a custom resource to add the capacity provider.

what I'm doing is passing the cluster name as parameter of the stack. this way you can use the parameter both in the ClusterName property of the cluster, and in the UserData of LaunchTemplate. circular dependency gone!

what I'm doing is passing the cluster name as parameter of the stack. this way you can use the parameter both in the ClusterName property of the cluster, and in the UserData of LaunchTemplate. circular dependency gone!

Thanks, but that is not a solution. At best it is a workaround that solves the circular dependency but not the operational aspects.

It will only work if you set your capacity to 0 at create time so the stack completes then run a second pass to update the minimum or desired count. If you start the autoscaling group with anything but zero, your nodes will not start the ecs agent because the cluster will not exist yet, you will need to replace the instances. If you use CreatePolicy and cfn-signal it will prevent the ASG from starting at all. You should not need two passes to make this work.

Any instances that launch ahead of the cluster creation do end up getting registered with the cluster as well once its created.

That is only because systemd restarts the service that spins up the docker container. It will eventually back off if it does not see it run successfully. I guess it will eventually join but that is not the point. We follow the same pattern with a service controlling a docker container on our Ubuntu AMI using ansible on bootstrap, the bootstrap will fail if the service does not start though and we get a non-zero return and failure signal is sent from cfn-signal. We want to know that the instance is ready to serve when it signals.

We are talking about operating this in a Cloudformation environment are we not? If you do rolling AMI updates for your cluster you surely want to know that the instance has joined the cluster before you signal for Cloudformation to move to the next instance (along with a lifecycle hook to drain instances before termination)?

We might be running a slightly niche way, sure... but do you think a new user should have to find this thread to find this workaround to get this working?

Hey, sorry if I'm being a bit being daft: I've been following the thread but not 100% sure what the status of this is. Is the functionality out, but somewhat experimental and not yet documented?

I Googled "cloudformation" capacity provider and this was the only thing that came back:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-cluster-capacityproviderstrategyitem.html

Is this enough to get up and running with Capacity Providers + CloudFormation?

Thanks!

@EdwardIII I found the same thing, Google is not returning the Cloudformation documentation for this yet but it is there. Some of it is working with workarounds but IMO it is currently half baked and needs work.

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.
Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?
Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

This is still not going to work. In a non-trivial deployment of ECS you will be wanting to pass the ECS Cluster ID into the launch configuration/template so that the agent knows which cluster to join. This means that the dependency chain goes like this:

ECSCluster -> Capacity Provider -> Autoscaling Group -> Launch Config -> ECSCluster

What is needed is a resource to manage the attachment of the Capacity provider to the ECSCluster and break the dependency loop. If that is not provided by AWS you are going to need a custom resource to add the capacity provider.

This is exactly the issue that I’m currently having, I’ve no idea how to break that dependency chain.

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.
Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?
Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

This is still not going to work. In a non-trivial deployment of ECS you will be wanting to pass the ECS Cluster ID into the launch configuration/template so that the agent knows which cluster to join. This means that the dependency chain goes like this:
ECSCluster -> Capacity Provider -> Autoscaling Group -> Launch Config -> ECSCluster
What is needed is a resource to manage the attachment of the Capacity provider to the ECSCluster and break the dependency loop. If that is not provided by AWS you are going to need a custom resource to add the capacity provider.

This is exactly the issue that I’m currently having, I’ve no idea how to break that dependency chain.

After being tired of getting it released and following up, at least I did exactly the same
Inside CFN, I create everything but I don't attach Capacity Provider to ECS Cluster

  • LaunchConfiguration->ECS Cluster(depends on implicit as I attach instance to cluster)
  • AutoScalingGroup->LaunchConfiguration (Depends on implicit
  • CapacityProvider->AutoScalingGroup ( Depends on implicit
  • CustomResourceToAttachCP->ECS Cluster & Capacity Provider(DependsOn Explicitly)

Inside Lambda CustomResourceToAttachCP , I use put_cluster_capacity_providers on create and delete (Blank)

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.
Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?
Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

This is still not going to work. In a non-trivial deployment of ECS you will be wanting to pass the ECS Cluster ID into the launch configuration/template so that the agent knows which cluster to join. This means that the dependency chain goes like this:
ECSCluster -> Capacity Provider -> Autoscaling Group -> Launch Config -> ECSCluster
What is needed is a resource to manage the attachment of the Capacity provider to the ECSCluster and break the dependency loop. If that is not provided by AWS you are going to need a custom resource to add the capacity provider.

This is exactly the issue that I’m currently having, I’ve no idea how to break that dependency chain.

After being tired of getting it released and following up, at least I did exactly the same
Inside CFN, I create everything but I don't attach Capacity Provider to ECS Cluster

  • LaunchConfiguration->ECS Cluster(depends on implicit as I attach instance to cluster)
  • AutoScalingGroup->LaunchConfiguration (Depends on implicit
  • CapacityProvider->AutoScalingGroup ( Depends on implicit
  • CustomResourceToAttachCP->ECS Cluster & Capacity Provider(DependsOn Explicitly)

Inside Lambda CustomResourceToAttachCP , I use put_cluster_capacity_providers on create and delete (Blank)

I really hope AWS can release an update to solve this problem, if I have to create a Lambda to attach Capacity Provider to ECS Cluster, it seems so tricky!

At the moment in order to work around the Circular Dependency you need to, unfortunately, name the Cluster, and cannot use an auto-generated name. That way you can specify in your Launch Configuration / Launch Template the name of the cluster as a string. I recommend the following name convention:

ECSCluster:
  Type: AWS::ECS::Cluster
  Properties:
    ClusterName: !Sub ${AWS::StackName}-ECSCluster

LaunchConfiguration:
  Type: AWS::AutoScaling::LaunchConfiguration
  Properties:
    UserData:
      Fn::Base64: !Sub |
          #!/bin/bash
          echo ECS_CLUSTER=${AWS::StackName}-ECSCluster >> /etc/ecs/ecs.config

I will work internally to see if we can get a separate resource to break the dependency in CloudFormation. However, naming the cluster seems like the best solution. Fortunately all actions to an AWS::ECS::Cluster resource are mutable actions apart from the ClusterName property, which means that a static name will not have the usual consequences that discourage the use of a static name.

At the moment in order to work around the Circular Dependency you need to, unfortunately, name the Cluster, and cannot use an auto-generated name. That way you can specify in your Launch Configuration / Launch Template the name of the cluster as a string. I recommend the following name convention:

ECSCluster:
  Type: AWS::ECS::Cluster
  Properties:
    ClusterName: !Sub ${AWS::StackName}-ECSCluster

LaunchConfiguration:
  Type: AWS::AutoScaling::LaunchConfiguration
  Properties:
    UserData:
      Fn::Base64: !Sub |
          #!/bin/bash
          echo ECS_CLUSTER=${AWS::StackName}-ECSCluster >> /etc/ecs/ecs.config

I will work internally to see if we can get a separate resource to break the dependency in CloudFormation. However, naming the cluster seems like the best solution. Fortunately all actions to an AWS::ECS::Cluster resource are mutable actions apart from the ClusterName property, which means that a static name will not have the usual consequences that discourage the use of a static name.

Sounds great!

Issue has state coming soon 4 month ago, but still we don't get an release fix for it. Whether dependencies solved. It's must need to do production ecs deployment as ecs agent update cluster @srrengar

Hi!
Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource is now available.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ecs-service.html

@anoopkapoor , circular dependencies issue ECS -> CP -> ASG -> Launch config -> CP, with have option to REF: ECS Cluster in CP as an separate resource may helps. or any proper way to solve ... please close this soon...

@belangovan ack. This is the list I'm tracking now:

  1. [complete] Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
  2. [complete] Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
  3. [coming soon] Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource
  4. Break circular dependency so that unnamed clusters can be created
  5. Stack deletion fails since the cluster deletion comes ahead of ASG deletion. Cluster cannot be deleted if instances in ASG are still active.
  6. Ability to update parameters in the AWS::ECS::CapacityProvider resource without interruption including ASG warm-up time.

@anoopkapoor , thanks for status, it's really help us to track. If possible, Please share us an tentative ETA ?

Yeah, bumping the issue. We've been stuck on this for a while.

Hi!
Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource is now available.

  1. [complete] Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
  2. [complete] Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
  3. [complete] Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource
  4. Break circular dependency so that unnamed clusters can be created
  5. Stack deletion fails since the cluster deletion comes ahead of ASG deletion. Cluster cannot be deleted if instances in ASG are still active.
  6. Ability to update parameters in the AWS::ECS::CapacityProvider resource without interruption including ASG warm-up time.

by clicking I saw "Not currently supported by AWS CloudFormation."

So is the document not up-to-date?

On Fri, Oct 9, 2020 at 1:42 PM anoop notifications@github.com wrote:

Hi!
Ability to enable scale-in protection in the
AWS::AutoScaling::AutoScalingGroup resource is now available
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html#cfn-as-group-newinstancesprotectedfromscalein
.

  1. [complete] Ability to reference the ASG name in the
    AWS::ECS::CapacityProvider resource
  2. [complete] Ability to specify a custom capacity provider strategy
    in the AWS::ECS::Service resource
  3. [complete] Ability to enable scale-in protection in the
    AWS::AutoScaling::AutoScalingGroup resource
  4. Break circular dependency so that unnamed clusters can be created
  5. Stack deletion fails since the cluster deletion comes ahead of ASG
    deletion. Cluster cannot be deleted if instances in ASG are still active.
  6. Ability to update parameters in the AWS::ECS::CapacityProvider
    resource without interruption including ASG warm-up time.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/aws/containers-roadmap/issues/631#issuecomment-706393388,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA72QMI3NF3CL7HIZF5T5MTSJ5YTRANCNFSM4JV6NRFA
.

by clicking I saw "Not currently supported by AWS CloudFormation." So is the document not up-to-date?

Thanks for pointing out. Document has now been updated.

Until this is not implemented, I've been using spot instances in ECS with docker compose by specifying an existing ECS cluster that is already configured with SPOT as provider (x-aws-cluster).

@srrengar , @anoopkapoor , any update in

  1. Break circular dependency so that unnamed clusters can be created
  2. Stack deletion fails since the cluster deletion comes ahead of ASG deletion. Cluster cannot be deleted if instances in ASG are still active.
  3. Ability to update parameters in the AWS::ECS::CapacityProvider resource without interruption including ASG warm-up time.
    we are blocked for production release

Any news on this issue? More than one year has passed since this problem was raised. Frameworks like Terraform can deal with Capacity Providers since day 0. I really like many CF / AWS CDK features, but the time AWS takes to support its own resources is really frustrating.

Was this page helpful?
0 / 5 - 0 ratings