Atlantis: Support clean shutdown

Created on 15 Oct 2018 · 5Comments · Source: runatlantis/atlantis

It would be great to implement a clean shutdown mode, that handles some signal and waits for all the child processes to complete before really shutting down.

This is closely related to upgrades in situations where Atlantis Server code is cleanly separated from data, for example running atlantis sever in a Docker container with atlantis user home mounted as a Docker volume.

_Tracking child processes can be nicely done by using cgroups, but I would discourage this implementation because it does not have a close analogs in the Docker/k8s world. AFAIK anyways._

feature

Source

teosoft123

👍18

Most helpful comment

Hi,
I'm working on it (implementing a drain).
I need it since we are deploying Atlantis with Atlantis in a K8s cluster, so we need RollingUgrades + a clean pod termination.
As you suggested, I'm implementing a drain endpoint, with a POST to start the drain and a GET to check its completion.
I will probably implement an operation like "atlantis shutdown" which will call this endpoint locally and wait for completion of the drain, so that it can be used in the preStop hook on K8s. I will probably have a working prototype before the end of the week.
We will battlefield test it asap on our cluster.
This means that I will propose as well a chart update for rollingUgrades + preStop hook in lifecycle.

benoit74 on 23 Mar 2020

👍3

All 5 comments

We run Atlantis in AWS Fargate and for upgrading Atlantis or pushing out configuration changes we first block ingress to the fargate task on it's AWS ELB. Then after some time we assume all Atlantis Terraform processes have completed and we recreate the fargate task with new configuration or container image tag using local terraform.

But we don't reliably know when all the terraform tasks are completed. I think this could be improved by adding an api endpoint to get the count of current terraform processes running.

Then our upgrade process would be:

Restrict ingress to IP address where I am running terraform
Poll that api endpoint for current count of tf processes - wait for count to drop to 0
Safely terraform apply the atlantis upgrade

atheiman on 26 Aug 2019

The work to know how many TF processes are running is the same to properly pass a context through to everything and then keep the Atlantis process running until all the TF processes are stopped so I'm not sure we need an API endpoint.

lkysow on 26 Aug 2019

Yea that would be fine as long as terraform applying fargate task changes can support waiting for the clean shutdown to happen - we use https://github.com/terraform-aws-modules/terraform-aws-atlantis

I could see this being a problem if a clean shutdown is waiting an hour for a long terraform process to finish

atheiman on 26 Aug 2019

Hmm, it looks like there's a 2m max (https://forums.aws.amazon.com/thread.jspa?messageID=907417) so that wouldn't necessarily work. Maybe an API endpoint like /drain or something would be necessary.

lkysow on 26 Aug 2019