Aws-cdk: Unable to use ECS Ec2Service construct when ECS cluster uses manually added EC2 capacity

Created on 27 Nov 2019 · 8Comments · Source: aws/aws-cdk

The Ec2Service class in the @aws-cdk/aws-ecs module has a check in its validate() method to verify that the cluster has capacity. While this seems like a reasonable check in many situations it assume that the user is using AWS CDK constructs to create ECS EC2 instances. However, if one where to manually create an ECS instance in a cluster that doesn't use EC2 capacity this check would prevent the user from using the Ec2Service construct and there is no way around it (except for the ugly workaround mentioned below).

An example use-case is an ECS cluster that is 100% Fargate (i.e. no EC2 capacity required). Then a need arises when you need a very specific use-case that Fargate does not support and thus require an EC2 ECS instance. Further, the EC2 instance is heavily customized in a way that is not supported from cluster.addCapacity() and requires the ECS instance to be created outside of addCapacty(). In this situation, the ECS cluster construct does not know the capacity you've manually added.

My recommendation is to remove this hasCapacity check in Ec2Service as I don't think it really services a useful purpose. At best it can catch a situation where someone accidentally forgot to add capacity but it cannot actually usefully tell you if you have enough capacity to actually run your service. What if your service requires 4xCPU but you only have enough capacity for 2xCPU. There is a myriad of cases where you can have enough capacity but ECS still won't be able to run your service (another example would be placement constraints).

Reproduction Steps

Create an ECS Cluster with a single Fargate service.
Create an ECS instance manually (not using addCapacity())
Create an ECS service using the Ec2Service construct

Error Log

Error: Validation failed with the following errors:
[path/to/myservice] Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster.

Environment

CLI Version :
Framework Version:
OS :
Language :

Other

This is :bug: Bug Report

@aws-cdaws-ecs efformedium feature-request needs-discussion p2

Source

cbeattie-tl

Most helpful comment

I have another use case for bypassing the capacity check of the Ec2Service construct (and by extension the ECS EC2 patterns). We're using Spot Fleet and Application AutoScaling Policies to handle instance capacity, so we're not using AutoScaling groups. It doesn't appear that addCapacity() supports this method of capacity management, so I'm unable to use our cluster with the Ec2Service construct. I get the same error that @cbeattie-tl mentioned in his issue: Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster.

akuma12 on 22 Sep 2020

👍3

All 8 comments

Code in question can be found here:
https://github.com/aws/aws-cdk/blob/4f6948c1ca5199592fc962c8305ad1c9fc806349/packages/%40aws-cdk/aws-ecs/lib/ec2/ec2-service.ts#L205

cbeattie-tl on 27 Nov 2019

@cbeattie-tl just wanted to clarify, are you trying to create an Ec2Service with a cluster that already exists in your account?

If the above is the case, you can actually import your cluster from your account using fromClusterAttributes:

https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-ecs/lib/cluster.ts#L52-L54

piradeepk on 27 Nov 2019

@cbeattie-tl just wanted to clarify, are you trying to create an Ec2Service with a cluster that already exists in your account?

If the above is the case, you can actually import your cluster from your account using fromClusterAttributes:

https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-ecs/lib/cluster.ts#L52-L54

I did notice if you use the Cluster.import() functionality you can artificially set the hasCapacity property. Another solution is to extend Ec2Service and override the validate() method.

However in my case, the ECS cluster is actually created in the same Stack so the _natural_ thing to do would be to just use the reference to the cluster as I created it (as opposed to working around this capacity issue with an artificial/unnecessary cluster import statement).

I hope this makes sense.

cbeattie-tl on 27 Nov 2019

Hey @cbeattie-tl

Thanks for the response. After some discussions, we deem the use case (i.e. support adding manually-provisioned instance) is not one of the intended use cases for CDK, which meant to create reproducible infrastructure. Please let us know if you have other questions.

hencrice on 16 Jan 2020

Hey @cbeattie-tl

Thanks for the response. After some discussions, we deem the use case (i.e. support adding manually-provisioned instance) is not one of the intended use cases for CDK, which meant to create reproducible infrastructure. Please let us know if you have other questions.

@hencrice What you are really saying is that AWS CDK will not support the use of the L2 Ec2Service construct if the ECS cluster is created in CDK but the "capacity" is provided outside of CDK (outside of the same stack actually). I believe this unnecessarily limiting.

not one of the intended use cases for CDK, which meant to create reproducible infrastructure

You'll get no disagreement from me. The whole notion of this use case _not_ being about reproducible infrastructure is moot. This use-case I am presenting here has nothing to do with infrastructure being "not reproducible". In fact, in my scenario the EC2 instance I created was actually created within CDK itself!!! Its all CDK - it's all CloudFormation. The situation at hand arises when you need to create EC2 VMs that cannot be created as part of an auto-scaling group. In my use-case, I needed to get the IP addresses of 3 nodes that form a quorum for bootstrapping membership. I needed to get the actual IP addresses of these nodes but I could no pre-assign them because I am running more than one "environment" in the same VPC so the IP's may conflict. I need "random" IPs but I need the private IP address to bootstrap into other CDK assets. I'm thrilled if you can indicate a way of doing this an autoscaling group but without writing a bunch of custom resources an easier approach was to just create 3 EC2 instances "manually" (in my CDK stack). Without addressing this defect (as I see it), we are unable to use the Ec2Service construct and I had to write a very hacky work-around to overcome this short-coming.

By way of recommendation, perhaps CDK could support a way of addition "capacity" to an ECS cluster that is provided by creating EC2 instances instead of auto-scaling groups. For example, what if ecs.Cluster supported a method like cluster.addEc2Instance() (in contrast to cluster.addAutoScalingGroup()).

Not attempting to belabor the point, but the whole rationale behind this hasCapacity validation is misguided. You might "have capacity" in that you created an autoscaling group but you may be over subscribed to it and your services won't fully start. This check doesn't really prevent you from anything other than maybe accidentally forgetting to add some capacity to your cluster. Another situation is that you "have capacity" but you have a placement constraint that prevents your services from running on those nodes. At best it should be a warning.

cbeattie-tl on 16 Jan 2020

@cbeattie-tl Thanks for the detailed use case and providing your recommendation. This is awesome! Let me bring it back to the team to reevaluate.

hencrice on 16 Jan 2020

akuma12 on 22 Sep 2020

👍3

Same issue here.
It would super useful if we could add ec2 instances by cluster.addCapacity().
For me, I would like to assign ec2 role but it is not support in addCapacity function now.
There are bunch of ec2 instance properties not supported now.
I think it would be more flexible to create ec2 instances and bind them to the cluster.

And in the console we can create empty cluster, it would be better to allow in cdk.