Containers-roadmap: ECS ENI Density Increases

Created on 28 Nov 2018 · 44Comments · Source: aws/containers-roadmap

Instances running in awsvpc networking mode will have greater allotments of ENIs allowing for greater Task densities.

ECS

Source

abby-fuller

👍150 ❤33 🎉28 😄6 👀1

Most helpful comment

Hi everyone: I'm on the product management team for ECS. We're going to be doing an early access period soon for this feature prior to being generally available.

In the event you're interested in participating: can you please email me at bsheck [at] amazon with your AWS account ID(s). I'll ensure your accounts get access and follow up with more specific instructions when the early access period is opened up.

Bensign on 18 Apr 2019

🎉26 👍16 👀4 ❤3 🚀2

All 44 comments

Will this benefit EKS workers also?

MarcusNoble on 12 Dec 2018

@MarcusNoble EKS uses secondary IPs, so it allows for much bigger tasks in each pod.
If we get new density to EC2, maybe it can benefit EKS too, but right now it is a much smaller issue than it is for ECS

FernandoMiguel on 12 Dec 2018

@MarcusNoble Can you please tell us more about your EKS pods-per-node density requirements?

ofiliz on 13 Dec 2018

It'd be great if we could make use of some of the smaller instance types (in terms of CPU and memory) but still benefit from being able to have a large number of pods. When we were picking the right instance type we've had to pick much more resources than we need because of the IP limitation when balanced with the cost of running more smaller ones.

MarcusNoble on 13 Dec 2018

👍14

yes please. this would be valuable. increasing container density on ECS/.EKS. (no matter if IP or port based) also having a one pager max containers per instnace flavor would be useful too

jpoley on 15 Dec 2018

👍1

An acceptable level of ENI density should be about 1 ENI / 0.5 VCPU and scale linearly with instance size, not every other as it is today.

jespersoderlund on 27 Dec 2018

An acceptable level of ENI density should be about 1 ENI / 0.5 VCPU and scale linearly with instance size, not every other as it is today.

I would say 1ENI / .5 VCPU would be on the low end. Honestly at that rate we probably still wouldn't bother with awsvpc networking mode. We regularly run 10-16 tasks on hosts with as few as 2 VCPUs.

mancej on 5 Jan 2019

👍18

I would point out that on other providers this limit is not in place. So coming in with purely a k8s familiarity.. I expect that there is a hard coded limit of 110 pods per node.

This one caught us a bit off guard. Started migrating from GCP and chose as close to same sized machines as we could in AWS. Start the migration and suddenly pods aren't starting.

It was only because we had happened to have remembered reading about ips per ENI that we were able to figure this out.

I can definitely understand the context switching for the CPU and other factors being an issue with traditional EC2. But with much smaller jobs running it would be nice to at least be able to acknowledge these risks and do it anyways.

Especially with EKS where we can / are responsible for setting usage needs to let k8s best schedule across our node capacity

geekgonecrazy on 13 Feb 2019

👍5 ❤1

I can explain a good use case for this. We currently have a EKS cluster on AWS and a AKS cluster on Azure.

On the Azure cluster we run many small pods (80 pods approx. per node): they are so small that they can easily fit on the equivalent of a m5.xlarge. Unfortunately, the m5.xlarge allows only 59 pods per node (of which at least 2 pods are needed by the system itself).

So we are basically using the Azure cluster for cost optimization.

emanuelecasadio on 29 Mar 2019

😕1

Any news on when we can expect an update? We are planning to move workloads to ECS using awsvpc but are currently blocked by this issue. We could use the the bridge networking mode for now, but for this it would be good to know whether an update to this issue is imminent or rather something for next year (both are fine, but information on this would be great)

peterjuras on 7 Apr 2019

@peterjuras We are currently actively working on this feature. Targeting a release soon, this year.

@emanuelecasadio Please note this issue tracks the progress of ENI density increases for ECS. We are also working on networking improvements for EKS, just not as part of this issue.

ofiliz on 8 Apr 2019

@ofiliz Does this mean "calendar year?" (ie, 2019). We were initially under the impression this feature would be shipping months ago. Until it does ship, awsvpc (and thus App Mesh) is not usable for us.

joshuabaird on 8 Apr 2019

👍3

@ofiliz Does this mean "calendar year?" (ie, 2019). We were initially under the impression this feature would be shipping months ago. Until it does ship, awsvpc (and thus App Mesh) is not usable for us.

I second this, I struggle to see AppMesh working for the majority of use cases with ECS given the current ENI limitations and sole support for awsvpc networking mode. It's a shame there is so much focus on EKS support when K8s already has tons of community support and tooling around service-mesh architectures. Meanwhile today, for ECS, all service-mesh deployments have to be more or less home-rolled due to limited support.

I've been patiently waiting, but I'm about to just roll Linkerd out across all of our clusters because the feature set of AppMesh as is right now is still very limited, and this ENI density issue is a non-starter for us. It seems AppMesh was prematurely announced, since it's just now GA 6 months after announcement, and is still effectively unusable for any reasonably sophisticated ECS deployments.

mancej on 8 Apr 2019

👍4

AWS tend to release services as soon as they are useful for some subset of their intended customer base. If you are running reasonably heavy memory containers then, depending on the instance type you use, you won't hit the ENI limits when using awsvpc networking.

While this is a problem for you (and myself) there are clearly going to be some people where this is useful and so it's obviously good to release it to those people before solving a much harder issue around ENI density or reworking the awsvpc networking on ECS to use secondary IPs such as with EKS via network policies on top of security groups.

There's certainly a nice level of simplicity in that with the awsvpc networking then each task gets its own ENI and thus you can use AWS networking primitives such as security groups natively. EKS' use of secondary IPs for pods sits on top of the already well established network policies used by overlay networks in Kubernetes but for a lot of people this is way more complexity than necessary.

I personally prefer the simplicity of ECS over Kubernetes for exactly these types of decisions.

tomelliff on 9 Apr 2019

I've said this before in multiple places.
having native SG per ENI is a huge benefit for any org.
Powered by Nitro technology, it should be possible to create a new instance family that removes the limit of ENI per vCPU/core that currently limits EC2.

FernandoMiguel on 9 Apr 2019

That's pretty outrageous speculation there.

Whatever you do you're still restricted by the physical limitations of the actual tin and part of that ENI per core thing is just because that's how instances are divided up as part of the physical kit. Even if the networking is entirely virtualised or offloaded there's still some cost to it and AWS needs to be able to portion that out to every user of the tin as fairly as possible.

tomelliff on 9 Apr 2019

true @tomelliff but would lift this entire problem to a different scale

FernandoMiguel on 9 Apr 2019

@joshuabaird @mancej Yes, this calendar year, coming soon. We appreciate your feedback. We are aware that this issue impact AppMesh on ECS and are working hard to increase the task density without requiring our customers to make any disruptive changes or lose any functionality that they expect from VPC networking using awsvpc mode.

ofiliz on 17 Apr 2019

👍3

Hi everyone: I'm on the product management team for ECS. We're going to be doing an early access period soon for this feature prior to being generally available.

Bensign on 18 Apr 2019

🎉26 👍16 👀4 ❤3 🚀2

With the release of the Amazon ECS agent v1.28.0 released today, the introduction of high density awsvpc tasks support was announced. What's the new limit ? Is it more ENI per EC2 instances ? more IP addresses per ENI ?
We have instances running as many as 120 tasks on them, wondering where the limit is now.

Thanks!

mfortin on 17 May 2019

@mfortin The agent release today is staged in anticipation for when we open up the feature for general availability relatively soon. At that point, we'll be publishing all the documentation with all the various ENI increases on a per-instance basis and I'll report back here at that time.

Bensign on 17 May 2019

👍2

@Bensign I sent you an email last month to be part of the beta test from my corporate email, we love being guinea pigs ;) If you prefer, I can make this request more official through our TAM.

mfortin on 17 May 2019

@mfortin Sending you a note momentarily on this.

Bensign on 17 May 2019

👀1

When is it planned to be live-prod or when/how I can use it?

Java4all on 24 May 2019

@Bensign Any chances to see the documentation or/and the feature availability date?
Such information would be great to know for planning the time, especially, when the vacation period is coming.

Java4all on 28 May 2019

👍3

@Bensign Any chances to see the documentation or/and the feature availability date?
Such information would be great to know for planning the time, especially, when the vacation period is coming.

It is still in beta, but the GA release is coming soon. I can't share more specifics right now, but we will update this issue once the feature is generally available.

coultn on 3 Jun 2019

👍1

shipped: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-account-settings.html

abby-fuller on 7 Jun 2019

🎉7 🚀1

@abby-fuller is this limited to the specific families listed on the docs, or does it also include sub families like c5d?

FernandoMiguel on 7 Jun 2019

@abby-fuller is this limited to the specific families listed on the docs, or does it also include sub families like c5d?

It is currently limited to the specific instance types listed in the docs. We are working on adding additional instance types.

coultn on 7 Jun 2019

👍3 😕2

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html

joshuabaird on 7 Jun 2019

🎉2

How does this work? Is there any reason why we wouldn't opt into this mode? Are there any limitations?

sargun on 8 Jun 2019

Is this actually working for anyone? I have the account setting defined, running newest ECS AMI (w/ 1.28.1 ECS agent, etc), but I still can only run 3 tasks on a m5.2x. I don't see that the trunk interface is being provisioned. Talking to support now, but I think they may be stumped as well.

joshuabaird on 10 Jun 2019

An update: I enabled awsvpcTrunking for the account using a non-root account/role. This role was also used to provision the ECS container instance and the ECS service, but ENI trunking was still not working/available. We then logged into the ECS console using the root account and enabled the setting (which sets the default setting for the entire account). After doing this, ENI trunking started working as expected.

joshuabaird on 10 Jun 2019

👍1

@joshuabaird Yup. I had the same issue. You need to enable the awsvpcTrunking as the root user. It's not obvious.

iwarshak on 10 Jun 2019

👍3

Does this apply just to ECS or also EKS? Was directed here by a couple of aws solution architects before this was closed. Was under the impression it would be usable by eks as well. The announcement doesn’t mention it though

geekgonecrazy on 11 Jun 2019

Hi @geekgonecrazy, this feature is currently only for ECS. Do you want more pods per node in EKS? Or do you want VPC security groups for each EKS pod? If you can tell us more about your requirements, we can suggest solutions or consider adding such a feature in our roadmap.

ofiliz on 11 Jun 2019

@ofiliz

I would point out that on other providers this limit is not in place. So coming in with purely a k8s familiarity.. I expect that there is a hard coded limit of 110 pods per node.

This one caught us a bit off guard. Started migrating from GCP and chose as close to same sized machines as we could in AWS. Start the migration and suddenly pods aren't starting.

It was only because we had happened to have remembered reading about ips per ENI that we were able to figure this out.

I can definitely understand the context switching for the CPU and other factors being an issue with traditional EC2. But with much smaller jobs running it would be nice to at least be able to acknowledge these risks and do it anyways.

Especially with EKS where we can / are responsible for setting usage needs to let k8s best schedule across our node capacity

To quote my initial comment here 4 months ago.

Every other provider we can do the k8s default of 110 pods per node. With eks we have to get a machine with more interfaces and way more specs then we need just to get 110 pods per node.

geekgonecrazy on 11 Jun 2019

👍9

Are there any plans to also bring this to the smallest instance types (e.g. t2/t3.micro)? I would rather plan on using this feature for DEV environments, where we would bin pack as much as possible, on production environments I don't see as much need here.

peterjuras on 11 Jun 2019

👍11

@ofiliz we have a workload running on a different cloud provider that we would like to move to EKS, but the fact that we cannot allocate 110 pods on a t3.medium or t3.large node is a no-go for us.

emanuelecasadio on 10 Jul 2019

@geekgonecrazy @emanuelecasadio Thanks for your feedback. We are working on significantly improving the EKS pods-per-node density, as well as adding other exciting new networking features. We have created a new item in our EKS roadmap: https://github.com/aws/containers-roadmap/issues/398

ofiliz on 18 Jul 2019

ENI trunking doesn't work when opting in via console as non-root user. You would need to opt-in as the root user via console or run the following command as root/non-root user.
aws ecs put-account-setting-default --name awsvpcTrunking --value enabled --region