Containers-roadmap: Support Task execution timeout(maximum-lifetime for a container) in ECS fargate

Created on 12 Nov 2019  路  10Comments  路  Source: aws/containers-roadmap

Summary


ECS does not currently support a task execution timeout so that when a task exceeds more than certain period of time, the task must be stopped automatically like how AWS Batch has job timeouts. The task definition does not have a parameter to enforce a task/container execution timeout that will automatically trigger the container to stop after the set time.

Use-case example from a customer:
I have a NLP model training job I want to run in a fargate container triggered by a lambda function. At some time, a bug might be introduced in the training code that would cause it to run indefinitely. I don't want to accidentally have those tasks piling up and have 50 tasks running for a couple weeks before we notice. That could have a cost implication. Is there a native way to kill a container if it hasn't exited on its own before a certain time?

Can this be considered as a feature request?

ECS

Most helpful comment

I can see another use case here, as mentioned in https://github.com/aws/containers-roadmap/issues/232

  • applies to both Fargate and EC2 execution methods for ECS, not just Fargate
  • when scheduling tasks using cron-style syntax with Cloudwatch Events/EventBridge, you would want to ensure tasks don't run indefinitely. if they did, and you have them set to spawn regularly, you would eventually exhaust cluster resources/service limits, effectively DoS'ing your AWS account

All 10 comments

Thanks @nitheesha-amzn for submitting this for me!
As we discussed in the ticket, a more native approach would be to have AWS Batch support Fargate launch type. This seems to be kind of a force-fit edge case for ECS.

moving this over to the container roadmaps as an ecs feature request

I can see another use case here, as mentioned in https://github.com/aws/containers-roadmap/issues/232

  • applies to both Fargate and EC2 execution methods for ECS, not just Fargate
  • when scheduling tasks using cron-style syntax with Cloudwatch Events/EventBridge, you would want to ensure tasks don't run indefinitely. if they did, and you have them set to spawn regularly, you would eventually exhaust cluster resources/service limits, effectively DoS'ing your AWS account

A problem I'm seeing is a task that is expected to be relatively short lived (few hours at most, but typically minutes) due to some bug is 'stuck' and still running after days.

It would be great to have a back stop that kills any jobs after X hours. Looking at the console with hundreds of tasks is hard to find the problem ones.

Would like to stop a bastion host after a period of time.

@adnxn any updates re where this sits on the roadmap? :)

+1

any updates re where this sits on the roadmap? :)

/ping @coultn

Bump! 馃

What do you think about adding a "essential" container to the task with a sleep XX. When the sleep ends ECS will stop the task then.

Was this page helpful?
0 / 5 - 0 ratings