Amazon-ecs-agent: Tune SIGKILL timeout

Created on 25 Jun 2015  路  20Comments  路  Source: aws/amazon-ecs-agent

I'm assuming ECS just uses the regular 10s default before a SIGKILL, but this is far too low for a number of programs we have which work with queues, any thoughts on making this tuneable?

kinenhancement

Most helpful comment

Would it be possible to allow an env variable like this to be set on a per container definition basis? As opposed to a global for the ECS agent itself?

I would suggest the container definition option would take precedent over the global, when set to something.

All 20 comments

It's actually 30s right now. Would you want to make this tunable on a per stop-task call or as an attribute of the container in the task definition, or both?

In reality, a robust application will have to handle shorter termination sometimes because there's no way you can always guarantee a graceful stop.
As a workaround (and yes, I know it's hacky) you can have a way to signal your application to cleanup and exit outside of the stop-task call.

I'll mark this as an enhancement, but I'd like a bit more discussion around whether this really does make sense to have.

Yeah I don't disagree with that, but currently we have a number of programs feeding from queues which continue processing in-flight jobs on stop down so nothing gets terminated cutting off responses etc. For our use-cases having it in the task def would be great.

+1 / I have the similar use cases and sometimes (especially on high load) 30s is not long enough and it kills my apps at a random point.

:+1:
I have long running workers running in docker containers which can take 2+ hours to complete a job. 30 seconds isn't nearly long enough. Even shorter running jobs which normally take 1-2 seconds to complete could be running on a really slow EC2 instance or S3 could be really slow, or really any number of things which could cause them to exceed a 30 second timeout, get killed, and loose data.

+1
For my application 10 seconds shutdown will almost never suffice, even the 30 seconds for multicontainer tasks will not be enough. On average I guess I need 2-5 minutes for a clean shutdown. I agree a robust architecture should handle any outage, but is there a reason to not make this configurable?

I forked this repo with a 4 hour timeout: https://github.com/dblackdblack/amazon-ecs-agent
Pretty simple diff: https://github.com/dblackdblack/amazon-ecs-agent/commit/e01a7790f536c96dbed84ed4a1ad900183edf410 which should be pretty easy to keep up to date with newer agent versions. Works great for me so far. I'm not a Go coder so I can't really do much more than this basic hack job nor write up a PR with my desired feature implemented.

+1 on this.
like @mereel01 said, is there any reason why this timeout shouldn't be configurable?

Basically there's been zero activity for this agent in the past month: https://github.com/aws/amazon-ecs-agent/graphs/contributors?from=2015-08-14&to=2015-09-28&type=c
The only commit is my 1-line README change.

In evaluating whether to use ECS or Kubernetes, this is pretty damning for such a young (and non-free) product, especially when Docker itself is under such rapid development.

@dblackdblack +1, I assume there could be internal development outside of GH going on (internal to AWS), but hard to say with certainty.

@dblackdblack That graph only shows commits to master. If you look at the dev branch, you'll see that we're actively working on the Agent.

Oh cool. sorry about that.

@samuelkarp apologies also... :)

@samuelkarp any reason this could't be configurable via an env var? It's not as nice as API support but beats nothing right?

@gjohnson That sounds fairly reasonable to configure a default timeout by environment variable. If you want to look at how we read environment variables for configuration, this is the best place to start.

@samuelkarp is this open to be an external contribution?
I'd like to help, but not sure how to get started with writing tests for this.

@rafkhan Sure! Please see our contributing guidelines. The timeout is currently specified here and tests are here.

contribution inbound soon :)

@rafkhan before I look to go down the route of changing this myself, did you ever end up doing this without just forking?

+1 for this feature

Would it be possible to allow an env variable like this to be set on a per container definition basis? As opposed to a global for the ECS agent itself?

I would suggest the container definition option would take precedent over the global, when set to something.

Was this page helpful?
0 / 5 - 0 ratings