Containers-roadmap: [ECS] Add the ability to delete an ASG capacity provider.

Created on 5 Dec 2019  路  71Comments  路  Source: aws/containers-roadmap

Currently, ECS Capacity Providers are immutable. Once created, they cannot be deleted. The change proposed here is to support deleting capacity providers.

ECS

Most helpful comment

This clearly isn't working as provided, not being able to do one of:

  • Delete a capacity provider
  • Delete the link between a capacity provider and its associated ASG
  • Reassign an ASG to another capacity provider

Is clearly a very wrong assumption and shows you have not thought out/designed the API with enough foresight.

+1 to getting the design and implementation of capacity providers built and deployed correctly

All 71 comments

Is there any way to new associate capacity provider with ASG which is associated with another capacity provider? I'm getting next error if I try to create new capacity provider for the same ASG:

botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the CreateCapacityProvider operation: The specified Auto Scaling group ARN is already being used by another capacity provider. Specify a unique Auto Scaling group ARN and try again.

It looks like currently you have to create new pair of capacity provider and asg everytime you want to adjust capacity provider settings.

+1 on this behaviour. Once you create a capacity provider and assign to ASG, they're married (til death do us part)... You can "deactivate" and remove the "link" between them, but you can't re-establish this again.

In my case, I am not even able to "deactivate" my capacity provider. When I look from Chrome's dev tools, aws seems to be returning 500 for this request. I thought it could be a temporary issue, but it's been over a week and the behavior is still the same. Is there any bug or is this expected?

For more details, I created the capacity provider with both Managed scaling and Managed termination protection disabled. Also I created it in ap-south-1 region if that helps.

I'm stuck in this same scenario. This feels less like a feature request and more like a bug.

I've also run into this bug.

After deactivating, there appears to be no visible association with the autoscaling group, yet the autoscaling group cannot be used for any other capacity provider, even after removing the ASG tag.

Better get it right the first time !!!

I'm excited to start using capacity providers, but this is clearly a bug. Please let us know where this issue fits on the roadmap.

Thanks everyone for your feedback. This is not a bug - it is functioning as designed. However, we do understand that the current design has some limitations, which is why we are working on the ability to delete capacity providers (this issue) as well as update the configuration of an existing capacity provider.

I also run into the same issue when trying out this Capacity Provider feature, once I deactivate the capacity provider, I can't create a new one link to the ASG any more. Please help fix this asap.

+1 stuck on this this as well, kinda disappointing and frustrating experience.

I'm trying to understand something. If this isn't a "bug," then what is the "proper" way of using this? I feel like I'm stuck on this and in order to move forward I have to recreate my ASG.

I stuck into the problem of cluster not getting destroyed due to the bug/design of aws_ecs_capacity_provider not getting deleted.I have to bypass and directly delete ASG in order to delete the cluster.Clearly this stops from creating clusters from terraform or likes

This clearly isn't working as provided, not being able to do one of:

  • Delete a capacity provider
  • Delete the link between a capacity provider and its associated ASG
  • Reassign an ASG to another capacity provider

Is clearly a very wrong assumption and shows you have not thought out/designed the API with enough foresight.

+1 to getting the design and implementation of capacity providers built and deployed correctly

This is a problem for me too. Causing a lot of issues with my terraform scripts and making it impossible to actually use ECS on EC2 in an automated way.

I came here from the AWS support. The workaround is to create new capacity provider with the different name. Hope the team will rise this request, because Capacity Provider is really needed feature.

@begetan - It's not only the capacity provider that must be recreated. Since the capacity provider and the ASG are inextricably linked, a new ASG must be created for each new capacity provider.

@rametta - As a workaround for this when using Terraform, the name of the capacity provider can be dependent upon the name of the ASG. Then, if you need to make an update to the capacity provider, the ASG name can be updated. This feels super janky in practice but it's been working for me for the time being.

resource "aws_autoscaling_group" "ecs_autoscaling_group" {
  name_prefix               = join("-", [var.app_name, var.environment, "recreated"])
  max_size                  = 4
  min_size                  = 0
  health_check_grace_period = 180
  health_check_type         = "ELB"
  desired_capacity          = 0
  force_delete              = false
  protect_from_scale_in     = true
  launch_configuration      = aws_launch_configuration.ecs_launch_config.id
  vpc_zone_identifier       = data.terraform_remote_state.vpc.outputs.vpc_private_subnet_ids

  lifecycle {
    ignore_changes = [desired_capacity]
    create_before_destroy = true
  }
}

resource "aws_ecs_capacity_provider" "ecs_capacity_provider" {
  /* The immutable dependency of an ECS capacity provider on a specific ASG is a major pain point.
   * Since this relationship is enforced as 1-to-1, the name of the capacity provider should reflect
   * the name of the ASG.
   * See  https://github.com/aws/containers-roadmap/issues/632
   * Also https://github.com/aws/containers-roadmap/issues/633
   */
  name = join("-", [var.app_name, var.environment, "asg", aws_autoscaling_group.ecs_autoscaling_group.name])

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.ecs_autoscaling_group.arn
    managed_termination_protection = "ENABLED"

    managed_scaling {
      maximum_scaling_step_size = 2
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      // Although poorly documented by AWS, target_capacity is a percentage (%).
      target_capacity           = 100
    }
  }

  tags = {
    App       = var.app_name
    Environment = var.environment
  }

}

Good lord... I even deleted the entire cluster the thing was associated to, so it's not in any way even _accessible_ through the console. And _still_ it lingers and prevents me from making a new correctly configured capacity provider after I re-created the cluster. What a nightmare. It's kinda messed up my entire account.

Update: And cool... I deleted the scaling group (I'm creating a whole setup for new company, so nuking and starting over is not entirely insane) and _now_ I've got a capacity provider pointing at a scaling group that no longer exists. I had my fingers crossed that removing the scaling group might clear things up... nope.

Yep this is a bit insane .. especially with the inability to "Update" a capacity provider .. even its basic attributes .. so if you create a capacity provider with managed_termination_protection off initially and want to use it now .. you're kind of screwed ..

AWS guys .. can you give us a better idea what's going on here and a better ETA than "we're working on it" .. this just seems straight up broken ..

Thanks everyone for your feedback. This is not a bug - it is functioning as designed.

K, so this design flaw? Sorry to have to pile on but this is rough.

I agree, this is crazy. Having to delete an entire ASG is a ridiculous requirement to simply tweak a capacity provider parameter. This it a complete non-starter for any established environment where SLAs need to be adhered to.

This is not a bug - it is functioning as designed.

Somehow that reminds me of this: https://dilbert.com/strip/2010-03-14

Sorry, couldn't resist.

It would be very helpful if the current API could be improved so that tools like terraform could do their job as intended.

Hi @coultn , thanks for confirming that this is being worked on. We're keen to migrate over to using capacity providers, however are rather hesitant until this issue has been resolved, due to the operational headaches that come with the current status quo. Are you able to provide any further details as to when we might expect this work to hit GA? Thanks.

Hi @kgyovai , Thanks for posting your https://github.com/aws/containers-roadmap/issues/632#issuecomment-584285121 workaround. I'm trying to replicate the approach you have taken , however I still get the capacity provider already exists error when I try to apply an update a capacity provider attribute.

I noticed that you mentioned:

if you need to make an update to the capacity provider, the ASG name can be updated

Are you doing something specific to trigger an update of the ASG name? I'm wondering if I need to change an attribute of the ASG at the same time as I change the attribute of the capacity provider?

@eddgrant - In the approach that I described above, the name of the capacity provider is dependent on the name of the ASG
name = join("-", [var.app_name, var.environment, "asg", aws_autoscaling_group.ecs_autoscaling_group.name])

Since the capacity provider cannot be updated and an ASG cannot be associated w/ multiple capacity providers, any change to the capacity provider requires creating a new capacity provider (which is subject of the current issue) and a new ASG. In order to ensure that a one-to-one relationship exists between the ASG and capacity provider, I have been making a simple increment a name prefix of the ASG anytime I need to change the capacity provider.

For example, if the name of my ASG was:
name_prefix = join("-", [var.app_name, var.environment, "1"])
It would then become:
name_prefix = join("-", [var.app_name, var.environment, "2"])

Note that since I'm using Terraform, I went with a name_prefix so that the name also includes a timestamp of when the ASG was created. This isn't a requirement for the approach that I'm describing but it does help to simplify the "book keeping" of numerous ASG -> capacity provider pairs.

The bad thing about this approach (believe me, I realize how many hoops are being jumped through by doing this as opposed to just have an update option for the capacity provider), is that a simple change to the ASG name prefix results in a cascading dependency of resource recreation.

aws_autoscaling_attachment.autoscaling_attachment must be replaced
aws_autoscaling_group.ecs_autoscaling_group must be replaced
aws_ecs_capacity_provider.ecs_capacity_provider must be replaced
aws_ecs_service.ecs_service must be replaced (because of the dependency on the capacity provider strategy)

So all of that just to create a new copy of the capacity provider.

The other "gotcha" to be aware of when using Terraform for this approach is resource lifecycle management. As shown below, specifying create_before_destroy on the ASG is required in order to ensure that the new ASG->capacity-provider pair are provisioned prior to deleting the old ASG.

  lifecycle {
    ignore_changes = [desired_capacity]
    create_before_destroy = true
  }

Hello, any progress on this? The capacity providers feature is unusable without the ability to "go back" - either to have an API to restore the provider, or delete entirely. Cannot use it now, I cannot just delete my prod cluster!

Also, there won't be any CloudFormation support for CapacityProviders until a delete operation is implemented. PLEASE make this a priority!

So I created a Capacity Provider linking my Cluster to my Auto Scaling Group.
Then I wanted to change the Capacity Provider, realized I couldn't. So I wanted to delete it, which I couldn't. So I disassociated it and tried to create a new one between the same cluster and ASG, which I couldn't. So I wanted my old one back, but I cant, it is gone.
So now I have to either delete the ASG or the Cluster in order to use Capacity Providers? O_o

You can get your old one back through the CLI, but yea, the rest is true.

Also seeing this, and there's absolutely no way for us to recreate the ASG as it's heavily used in production. @coultn is a fix for this behavior planned? In addition, this is now in a weird state where I assume the capacity provider still exists but is not visible in the UI.

Also seeing this, and there's absolutely no way for us to recreate the ASG as it's heavily used in production. @coultn is a fix for this behavior planned? In addition, this is now in a weird state where I assume the capacity provider still exists but is not visible in the UI.

Yes, that's what the "we're working on it" status means on the containers roadmap. We will share updates here and in the AWS docs once we have more to share.

So am I right that there is currently no workaround avaiable? I also "deleted" my capacity provider, now I'm stuck.. I cannot restore it, and I also cannot create a new one as it says that the autoscaling group is already linked?

Same here.

So am I right that there is currently no workaround avaiable? I also "deleted" my capacity provider, now I'm stuck.. I cannot restore it, and I also cannot create a new one as it says that the autoscaling group is already linked?

You can get it back with CLI.

You can get it back with CLI.

How? Couldn't find a suitable command, there is only create-capacity-provider and describe-capacity-providers?

How? Couldn't find a suitable command, there is only create-capacity-provider and describe-capacity-providers?

https://docs.aws.amazon.com/cli/latest/reference/ecs/put-cluster-capacity-providers.html

Nice feature, released months ago, could not be really used in CLI, API nor cloudformation :-/

Same issue here. Still surprised that this isn't fixed yet... :-(

@coultn is there any positive indication to see the fix for this issue sometime soon? obviously this is a show stopper for "Capacity_provider"!

Still same issue cannot delete capacity provider,it is causing lot of issues in terraform

Any update on this? Even if we can change the capacity provider to a different ASG, that'll be helpful. Or even allowing duplicate capacity provider name.

I don't know who decided that this capacity provider functionality was ready enough to be made available to the public, but that was clearly a mistake. I hope a fix will come soon, and that lessons will be learned from this experience.

@Th0masL that's overcritical.

Everyone knows the popular CR acronym for important management operations Create and Read.

The biggest WTF of this entire thing is the completely useless error message:

ClientException: The specified capacity provider already exists. To change the configuration of an existing capacity provider, update the capacity provider.

You can't update the capacity provider. This guidance is incorrect. To change the configuration of an existing capacity provider, you must own a time machine.

Adding to the confusion, you can deactivate it, but that seems to be something they should actually remove because that doesn't do anything any reasonable person would want to do - it keeps the capacity provider around, but then just hides it from API calls and the console.

I can confirm that tainting (deleting and recreating) the ASG + deactivating the capacity provider + removing the capacity provider from the default strategy does _not_ make the name available for reuse.

Best practice seems to be to use name_prefix on the ASG in Terraform and then name the capacity provider after the ASG's name. You can use the create_before_destroy lifecycle option in Terraform on the ASG and capacity provider if you need to make a change, but then you'll have to go into the console and deactivate the old capacity provider (or else you'll risk bumping into the limit of 6 capacity providers).

You'd have to ask yourself how far you want to go though. If your services use the default capacity provider, I guess they'd continue to work with the new capacity provider, but I don't know what happens with the old tasks (and I could be wrong!). I also don't know if this works with managed scaling; this comment from an AWS engineer on another issue would imply that the explicit capacity provider is mandatory for managed scaling to work, but I don't actually know. Do you then make your service name dependent on the capacity provider, so you spin up new services whenever the capacity provider configuration changes (because you can't update services, either)? create_before_destroy would help you there, too, but this is awfully messy for changing options on something that inexplicably has no update mechanism.

All of us with TAMs should ping them about this.. it's inexcusable that a feature so clearly in beta is being passed off as GA, and that ECS in general has illogical API architecture decisions that have existed for years, like not being able to change a service's placement constraints, or not being able to change a service's placement strategy. It's not like AWS is some tiny startup with limited resources; releasing buggy, unfinished software like this should never fly.

(edited to change an adjective, because it detracted from my point and wasn't kind)

@mustanggb's comment posted this link:

https://docs.aws.amazon.com/cli/latest/reference/ecs/put-cluster-capacity-providers.html

This only makes capacity providers available/unavailable to a cluster, it does not make it possible to change the parameters of a capacity provider. Want to change the target capacity? Forget about it.

@ziggythehamster not related to the main issue here, but they do support updating placement strategies and constaints (in preview) now: https://aws.amazon.com/about-aws/whats-new/2020/03/amazon-ecs-supports-in-preview-updating-placement-strategy-and-constraints-for-existing-ecs-services/

Besides unable to deactivate capacity provider, detaching a capacity provider from a service is not working for me, I have to delete all my existing resources (ASG,Service,target group etc.) to get rid of the capacity provider. I think it's a huge mistake to serve this feature before it's completely ready to use.

Chiming in on this issue... Capacity Providers are basically unusable in their current state. Its been 6 months since they were released, is there an updated ETA on when the ability to delete them might be available?

De-activating/Detaching the capacity provider might not be working because it might already in use by an existing ECS service. It's a pain because the error message shows up in cloudtrail but not in console.

That being said, another headache with capacity provider is that if you switch the launch type to capacity provider strategy in ECS service, you cannot switch it back to EC2 launch type. So, if you want to deactivate capacity provider, you need to either delete the ECS service (you can create another temporary ECS service with EC2 launch type before deleting to ensure there is no downtime). This is a major pain.

I started the POC of Capacity Providers for one of my project and run into this issue. From this mail thread what I understood is the only way to modify the capacity provider configuration is to delete the AutoScalingGroup and recreate the new one, which is of-course not possible if you are in production.
Is there a possibility that we add new AutoScalingGroup--> New Capacity provider and change the capacity provider strategy of each of the services, may be by reducing the weight of the misconfigured capacity provider to 0 and then deactivate it?

You misunderstood - capacity providers are forever. You can create a new ASG and a new capacity provider and a new service and transition the service traffic to the new ASG. Then you can delete the old ASG and service, and deregister the old capacity provider. But that old capacity provider will always exist and occupy that name forever.

Creating a new ASG and service is very inconvenient, and arguably disruptive in production.

You misunderstood - capacity providers are forever. You can create a new ASG and a new capacity provider and a new service and transition the service traffic to the new ASG. Then you can delete the old ASG and service, and deregister the old capacity provider. But that old capacity provider will always exist and occupy that name forever.

Creating a new ASG and service is very inconvenient, and arguably disruptive in production.

very painful .. even after deleting ASG when i query aws ecs describe-capacity-providers still its listing old capacity providers name ... I can't reuse the name :(

Maybe if we all create millions of capacity providers, they'll be forced to fix this to maintain operations?

Maybe ECS is maintained by an unpaid intern? 馃

I get that maybe they can't delete capacity providers because they aren't able to reliably destroy resources under the hood and were lazy about error handling, but making them immutable and requiring they're unique in this account region is insane. It makes you wonder what other bad design choices were made in ECS. Is this a freshman CS student's summer project that we're all running our high traffic critical production workloads on?

As always, we deeply value your feedback and strong desire to see this shipped. As the product manager who created the capacity provider and cluster auto scaling features, and who works closely with the team building it, I absolutely understand your frustration that the delete capacity provider API is not yet available. However, I can share that it is indeed coming soon, and we will provide updates here the moment we have more information to share.

Some comments in this issue are just needlessly rude. We all want this fixed and we already know they are working on it. If anything, take this as a lesson to thorougly test new features in other environments before applying them in production.

I know this isn't great but for people who deactivated a scaling provider and just want to get them back, you can as long as you choose the exact same name and auto-scaling group. I didn't know I couldn't make new ones and hit this issue. This at least allowed me to get them back even know they are not named the way I want and I have to live with it for now. :)

Hey @coultn , thank you for the update. We are very excited for this to be released.

Will the ability to tweak a Capacity Provider Target Capacity be included with this release? Given the many variables that play into the about of hosts desired it can take a bit of tweaking to get the free capacity at the level that we want it.

Please let us know the status of the update as well. If we deactivate a capacity provider, the link between it and the ASG was still there. We cannot create new Capacity Provider with the same ASG which is very odd, especially there is no page to edit the Capacity Provider as well.

I just wanted to update the target capacity in a capacity provider. Now I have to replace the entire auto scaling group. This is simply stupid.

Any ETA on a release date? This is blocking needed infrastructure work for us.

Any ETA on a release data yet?

Version 2.695.0
feature: ECS: This release adds support for deleting capacity providers.

is this what we were waiting for?

Looks like it dropped today:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/document_history.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html#asg-capacity-providers-delete-capacity-provider

I'll go check if @terraform-providers has a ticket. I wonder also if the feature of updating constraints/strategies is out of beta. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ecs/update-service.html indicates it is still beta, but the ECS docs disagree.

Ah, I misread - you can delete, but not update. Still better than before, assuming you can reuse the same name. Then a destroy/create becomes possible as an update.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html#asg-capacity-providers-delete-capacity-provider

Without CloudFormation support, of course.

Without CloudFormation support, of course.

But next day support in Terraform (on a fork). https://github.com/terraform-providers/terraform-provider-aws/pull/13740

Delete capacity providers functionality is now available through cli/sdk/api/console in all regions!
https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-ecs-capacity-providers-support-delete-functionality/

So all of that just to create a new copy of the capacity provider.

The other "gotcha" to be aware of when using Terraform for this approach is resource lifecycle management. As shown below, specifying create_before_destroy on the ASG is required in order to ensure that the new ASG->capacity-provider pair are provisioned prior to deleting the old ASG.

  lifecycle {
    ignore_changes = [desired_capacity]
    create_before_destroy = true
  }

Hi @eddgrant, does this mean that there will be plenty of ghost capacity provider left in aws?

Delete capacity providers functionality is now available through cli/sdk/api/console in all regions!
https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-ecs-capacity-providers-support-delete-functionality/

Hi @anoopkapoor, I tried deleting with aws cli but im getting the following error.

> aws ecs delete-capacity-provider --capacity-provider arn:aws:ecs:ap-southeast-1:...:capacity-provider/myCapacityProvider
> Invalid choice: 'delete-capacity-provider', maybe you meant:

  * create-capacity-provider
  * describe-capacity-providers

aws cli version

aws --version
aws-cli/2.0.4 Python/3.7.4 Darwin/19.5.0 botocore/2.0.0dev8

Delete capacity providers functionality is now available through cli/sdk/api/console in all regions!
https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-ecs-capacity-providers-support-delete-functionality/

Hi @anoopkapoor, I tried deleting with aws cli but im getting the following error.

> aws ecs delete-capacity-provider --capacity-provider arn:aws:ecs:ap-southeast-1:...:capacity-provider/myCapacityProvider
> Invalid choice: 'delete-capacity-provider', maybe you meant:

  * create-capacity-provider
  * describe-capacity-providers

aws cli version

aws --version
aws-cli/2.0.4 Python/3.7.4 Darwin/19.5.0 botocore/2.0.0dev8

Hi! Your CLIv2 version is from March. If you update that to the latest version, it will work just fine.

Delete capacity providers functionality is now available through cli/sdk/api/console in all regions!
https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-ecs-capacity-providers-support-delete-functionality/

Hi @anoopkapoor, I tried deleting with aws cli but im getting the following error.

> aws ecs delete-capacity-provider --capacity-provider arn:aws:ecs:ap-southeast-1:...:capacity-provider/myCapacityProvider
> Invalid choice: 'delete-capacity-provider', maybe you meant:

  * create-capacity-provider
  * describe-capacity-providers

aws cli version

aws --version
aws-cli/2.0.4 Python/3.7.4 Darwin/19.5.0 botocore/2.0.0dev8

Hi! Your CLIv2 version is from March. If you update that to the latest version, it will work just fine.

Hi @anoopkapoor, thank you it works!!!!

@anoopkapoor Do you know if there is a corresponding issue registered with Terraform? It still seems to me that it's the case that even in terraform version

terraform --version                      
Terraform v0.14.2
+ provider registry.terraform.io/hashicorp/aws v3.20.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

this issue still persists. I tried searching through the issues on their github issues, but wasn't able to find anything.

Was this page helpful?
0 / 5 - 0 ratings