Amazon-ecs-agent: Improve support for multiple containers needing same port on one host

Created on 27 Mar 2016 · 13Comments · Source: aws/amazon-ecs-agent

Hi,

As per the Contributing guideline, before making any big changes, it's indicated that an issue should be opened to discuss the said change.
I'd like to work on the following feature: support multiple containers on the same EC2 instance exposing the same port to the outside world.
The way I would like to approach this is to have ECS Agent support registering multiple containers on various ports and proxying them to the same EC2 port. Baiscally, when the task is defined to map port 443 from the container to port 8080 to the EC2 instance, the ECS agent will map 443 to and then connect to that port and open port 8080 to the world.
This of course has implications over the health-checks so I propose that initially, the health-check is done via the normal mechanics however when that reaches the ECS agent, it will perform the request to all containers and it finds one that's not working it will try and restart it.
The next issue is that of loading balancing and for that I propose a simple round-robin scheme, with no persistent connection support or affinity (might be added later?)

What do you think? Thank you.

kinfeature request

Source

dlsniper

👍10

Most helpful comment

Hey All,

Application Load Balancer integration with ECS should help in resolving use-cases mentioned in this thread. You can read more about it in these blog posts:

Closing this issue for now. Please reach out to us if you have any feedback.

Thanks,
Anirudh

aaithal on 11 Aug 2016

🎉7 👍1

All 13 comments

Hi @dlsniper,

Thanks for opening the issue to talk about your change! Can you explain a bit more about what you envision here? I'm a bit unclear on exactly what you're asking for. For multiple processes to receive traffic on the same port, some scheme would need to be devised to deliver traffic to the correct process. Are you looking for something like:

All processes listening on the same port are of the same application type (i.e., multiple instances of the same container), so TCP sessions would be load-balanced between them?
Processes listening on the same port are different application types and traffic should be routed according to HTTP Host header (like virtual hosts with a web server)?

Thanks,
Sam

samuelkarp on 28 Mar 2016

Hi @samuelkarp,

Sorry I wasn't very clear in the initial text, hopefully I'll be able to be so now.

I'm currently looking in a situation where I would launch an EC2 fleet with ECS on it to support the same application only. The problem is that because I can't expose the same port from two container instances on the same machine, ECS is not that useful (unless I'm missing something).
Also, in the future, I would like to be able to launch say two different containers that need port 443 but they connect to different ELBs and they can coexist on the same EC2 machine (not in front of an ECS console right now, can't remember if this is currently possible).

As there are a lot of complexities around this whole issue, I would propose starting small, balance the connection (TCP or UDP) to multiple instances of the same container on the machine itself.

Support for HTTP Host header (or similar) could be added later on, if deemed necessary. There are plenty of other ways this could evolve (request path based routing for example).

I've yet to look at the code itself, but I know a bit of Go, but I hope I can be guided when doing the actual PR on how this can be solved.

If it's still unclear:

app A uses 443
I have a fleet of EC2 instances with ECS on them
I would like to run multiple containers of A on a single EC2 instance
app B uses 443 as well
later down the road, I would like to create a service which is launched on the same EC2 fleet.

Thank you.

dlsniper on 29 Mar 2016

Hi @samuelkarp, please let me know if you need any further information for this. Thank you.

dlsniper on 31 Mar 2016

I'm not really connected to AWS, but I think you're trying to approach the problem from wrong side. I don't see any reason to add proxy-like features to ECS agent.

Two services needing 443 port:

Internet <- 443 -> ELB 1 (I would terminate the SSL here unless you have strict reason not to) <- 10001 -> Service A
Internet <- 443 -> ELB 2 (I would terminate the SSL here unless you have strict reason not to) <- 10002 -> Service B

Advanced routing:

Internet <- 443 -> ELB <- 10003 -> Reverse proxy/Service discovery (Nginx+etcd, nginx+consul, many options) <- many ports/IPs -> Many services

You now have full power of nginx/haproxy routing, optimizations and so on and you're not reinventing the wheel.

Well my 2c. Also there are other options too.

I do however think that ECS couple be improved with better routing options which could be in the middle between using loads of ELBs (expensive) and using your own service discovery/proxy. However I'm not sure that it should be implemented in ecs-agent.

MaikuMori on 1 Apr 2016

@MaikuMori I agree that ECS agent could be left alone. But... then we could have a complex solution like the one proposed:

Internet <- PORT -> ELB <- PORT -> Reverse proxy/Service discovery (Nginx+etcd, nginx+consul, many options) <- many ports/IPs -> Service containers (yaaay!!!)

when the solution could be:

Internet <- PORT -> ELB <- PORT -> ECS -> Service containers (yaay!!!)

And that's the thing, I don't want to have to suddenly manage a Reverse proxy/Service discovery plus deal with the additional complexities that arise from this. In the proposed solution I need to take care of managing nginx/haproxy, etcd/consul, make sure they are all alive, understand which services are where, what ports and so on. ECS already has that information. Thus having the solution integrated with ECS agent means that I could just define my service in ECS and then never have to configure anything else anywhere else. ECS agent goes down? Scrape the instance, launch another one, meanwhile the ECS server can happily reschedule the containers I've just lost.

Kubernetes does this exact thing that I need but unfortunately we don't have Kubernetes we have ECS Agent.

dlsniper on 1 Apr 2016

👍4

(I totally thought I responded to this yesterday before adding the label...I must have clicked the wrong button. My apologies for the late reply!)

Hi @dlsniper,

Thanks for opening this issue and describing your use case. We're hesitant to agree to this for a few reasons:

Your use-case is for many copies of the same application running on the same port, but a feature that only works in this configuration is a bit tricky: either ECS provides no verification that all the containers attempting to bind to the same port are actually the same (which would cause very interesting behavior when you try to reach that port) or ECS could attempt to enforce it by way of restrictions on the task definition (but if you push a new image to the same tag and start a new task, you may have that same weird behavior).
At present, the ECS agent is not in the critical path for availability of your tasks; if the ECS agent has a problem, tasks which are running will continue to run without impact. Bringing a form of network routing/proxying into the agent adds it to the critical path of your application availability.

As @MaikuMori explained, you can actually accomplish your request of different applications bound to different load balancers today by choosing different host ports for the applications,

For these reasons, we're not likely to accept a pull request of this nature (though you're more than welcome to implement this yourself and run your own fork). With that all said, we are working on supporting this use-case more broadly. We'll update this issue when we have a bit more to share.

Thanks,
Sam

samuelkarp on 1 Apr 2016

@samuelkarp thanks for the reply.
I would most certainly look for how you (AWS) will solve this issue as I'd rather not have to maintain my own fork.

I do understand the reasons you mentioned and here's my feedback on them:

Your use-case is for many copies of the same application running on the same port, but a feature that only works in this configuration is a bit tricky:

Actually this should be fairly straight forward to use: don't allow two different service definitions to use the same ports. Or if you allow that, don't schedule them on the same EC2 hosts. I'm not sure about the last part: (but if you push a new image to the same tag and start a new task, you may have that same weird behavior). can you please explain what would be the problem then?

At present, the ECS agent is not in the critical path for availability of your tasks; if the ECS agent has a problem, tasks which are running will continue to run without impact. Bringing a form of network routing/proxying into the agent adds it to the critical path of your application availability.

True, and I'm very well aware of the implications it has. Knowing AWS, I'm sure such a thing would be rigorously tested before being rolled out to production. But unless there's a plan to change the ELB to be able to talk to containers directly via multiple different ports (or some other magic) I don't see how it could keep ECS (or anything else) from doing so. Like I've mentioned earlier, Kubernetes does this already and it doesn't seem there's a problem with that.

And yes, you can do all the other things currently, but "the cloud" should be about convenience. There certainly is no convenience in running ECS right now (at least for my case) if the same container can't be scheduled on the same instance multiple times. As it is now, I'm better of using auto-scaling groups only, which are much more slower to do roll-out deploys.

I'd be very happy to keep this alive and talk on this more to see how the functionality could be achieved.

Thank you.

dlsniper on 2 Apr 2016

👍1

Hi @dlsniper,

Thanks for writing back!

Actually this should be fairly straight forward to use: don't allow two different service definitions to use the same ports. Or if you allow that, don't schedule them on the same EC2 hosts.

That's certainly a possible path for this (and I alluded to this in my reply regarding restrictions on the task definition), but ultimately we don't really feel that this is the right solution (due to introducing additional cognative complexity into the system and the availability concerns, among others).

I'm not sure about the last part: [...] can you please explain what would be the problem then?

An ECS task definition includes an image field for specifying the container image that should be run. Image names can optionally include a tag reference (or implicitly have one called latest). Tag references are mutable, meaning that you can push a different image to the same name (and tag) in your registry. When you run a task with ECS, the ECS agent unconditionally pulls the image in order to make sure it runs the most recently-pushed version of the image at that specified tag. If you were to start a task with an image of myimage:mytag that gets bound to a port under the scheme you suggest, then push a different image to the same name myimage:mytag and start another copy of the task (either by invoking our StartTask or RunTask API or by the service scheduler as part of normal scaling/replacement), the second copy of the task would end up running a different image (which could potentially be a different application altogether). In order for this to really work, the placement logic inside ECS would need to be able to determine whether the contents of the image/tag have changed between the time that the last task on the instance was started and the next task is attempting to be placed. This could potentially happen if every image deployed through ECS was public, but private registries are a very common use-case (and even for public images, introducing a check like this could be fairly impactful in terms of latency and availability of ECS). Please let me know if I've explained my concerns here clearly.

Knowing AWS, I'm sure such a thing would be rigorously tested before being rolled out to production.

I really, truly appreciate you saying this (testing is super important!). However, it's precisely because we want to maintain rigor around reliability and availability that we do not want to introduce a critical-path dependency for your applications on the ECS agent. This applies to the proxying you're suggesting as well as to other use-cases that would imply a critical-path dependency.

Please let us know if you have any further questions. The discussion here is incredibly useful for us for making sure we understand your use-case fully.

Sam

samuelkarp on 2 Apr 2016

This might be an interesting way to solve the issue as well (when it become available and it's stable): Docker support for ipvlan https://github.com/docker/docker/blob/master/experimental/vlan-networks.md since it means port mapping won't be needed anymore. Not sure how this will work out with EC2 but should be doable, no?

dlsniper on 4 Apr 2016

👍2

With that all said, we are working on supporting this use-case more broadly.

@samuelkarp and could you provide approximate time when some solution will be released? I'm facing same problem as @dlsniper and I don't want invest a lot of time to learn how to deploy and manage consul + fabio when there is chance Amazon will release same functionality "one day" later.

s7anley on 6 Apr 2016

👍2

Another use case this is interesting for:

You have 20 small sites, each consisting out of an nginx image, a php-fpm image, and a data container which holds the site content
You would like to run these 60 (20*3) containers on the same EC2 hosts, each one exposes 80 and 443
You want to avoid having to run one ELB pointing to one global nginx which links in all other 20 sites
(or) You want to avoid running 20 ELBs pointing to individual ports for each of the sites

Unless a scenario like this already works and I missed something :)

CumpsD on 20 Jul 2016

👍5

I need a fleet of Layer 7 switches (haproxy) to proxy large volume of traffic to multiple backends (based on path), behind ELB.

I want to maximize resource usage.

Each container obviously would need to be bound to the same host port. In my understanding, currently ECS would only be able to have one container per host due to this limitation—thus, I cannot maximize resource usage unless I try to find best instance size, blah...

I guess my use case would be solved if ELB were to get capability for Layer 7 switching based on requested path...

Maybe I should just use CloudFront? 🙀

execjosh on 6 Aug 2016

Hey All,

Application Load Balancer integration with ECS should help in resolving use-cases mentioned in this thread. You can read more about it in these blog posts:

Closing this issue for now. Please reach out to us if you have any feedback.

Thanks,
Anirudh

aaithal on 11 Aug 2016

🎉7 👍1

Was this page helpful?

0 / 5 - 0 ratings