Compose: Deploying updates to production with zero downtime

Created on 16 Dec 2014  路  21Comments  路  Source: docker/compose

Hi, thanks for very useful tool. How can I deploy updates to my rails app on production with zero downtime.

Currently I run the following on production, but it causes about 10 seconds of downtime.

sudo fig pull web
sudo fig up -d web

My production fig.yml:

db:
  image: postgres:9.3
  volumes_from:
    - db-data
  ports:
    - 5432
web:
  image: myaccount/my_private_repo
  command: bundle exec unicorn -p 3000 -c ./config/unicorn.rb
  volumes_from:
    - gems-2.1
  ports:
    - "80:3000"
  links:
    - db

Thanks

kinquestion

Most helpful comment

I just worked out another flow to have Zero downtime (mainly for web apps).

Proxy

We use jwilder/nginx-proxy to handle routing to app servers, this will assist us in dynamically routing requests to services.

First Deploy

For the first deploy run docker-compose --project-name=app-0001 up -d.

Rolling Update

We edit the docker-compose.yml with the new image id and run docker-compose --project-name=app-0002 up -d. We now have version 0.2 of the app up and running. The load balancer will already begin routing requests, and given we are using nginx LB we will have 0 downtime.

If you need to reach a desired scale you can now run that command to scale up resources before you shutdown the older version.

Now we can do docker-compose --project-name=app-0001 stop to close down the previous deploy. (Optionally we can run an rm to remove the data - but it might be a good idea to only remove this on the next deploy i.e. deploy app-0003 up, app-0002 stop, app-0001 rm).

Truly rolling update

If you have reasoning behind limiting the number of resources running at a given time you could simply stagger the scale. I.e:

app-0001 scale web=10
...
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=6 app-0002 scale web=4
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=10 app-0002 scale web=0
...
app-001 stop
Rollback

A rollback is also quite simple, run up on 0001 and stop on 0002.

80/20 deploy

This would be as simple as running scale web=2 on app-0001, and scale web=8 on app-0002.

Automate Everything

This also reminds me of a similar deploy strategy to capistrano, I might even look at using a similar tool to wrap docker-compose and save a rewrite of the deploy logic.

All 21 comments

zero downtime can not be achieved while rebooting a container. You need to either load balance to an other container while updating this one or update the code in your container and run a kill -HUP on unicorn.

Fig doesn't provide a solution for zero-downtime restarts. We shouldn't rule it out, but it's a complicated task and needs a thorough discussion of what functionality would best serve the majority of production cases.

Thank you, @coulix and @aanand. I assumed that FIG was suitable for production out of the box without extra work (which was silly of me, I admit). Do you think a paragraph or two about it can be useful in the readme? And maybe with links to tutorials and blog posts where people describe their production setups. Just to make it clear that FIG can not do zero-downtime restart, at least at the moment.

Since Fig is explicitly a development tool at the moment, I don't feel it's urgent that we address zero-downtime restarts in the docs. As Compose shapes up, it's definitely going to _become_ a concern (see the roadmap), and at that point we'll start to talk about production use more officially.

@evgenyneu Great post ! +1

Couple separate questions:

  • If specifically for development, best recommendations for production?

This is possibly a stupid question. Yes I know there are a lot of options out there. Im pretty new to docker and still learning the ropes a bit. Though it'd be nice to be able to do development work and production deployments with as similar a configuration file as possible. Or like... common configuration file with overriding files for each environment.

  • I have a load-balancer for project setup, can i _recycle_ web/workers one at a time?

This (at least for me) would solve the very simple zero-downtime using fig in production on a single machine. I noticed there is no way to fig kill web_1 and fig restart web does not _rebuild_ the container even after fig build web to rebuild the image. Example of what I see happening below.

# as example: i have 3 web containers running
fig recycle web
# - fig build web - rebuild our image
# - tear down/kill web_1, start new web_1 fresh from new image
# - tear down/kill web_2, start new web_2 fresh from new image
# - tear down/kill web_3, start new web_3 fresh from new image

You could remove the auto fig build web and have the user do it manually before hand if you wanted, and make users do fig build web && fig recycle web. Also maybe adding a -p, --pause option to set a pause in between to give services time to restart if you are using rails or java or similar.

+1, there doesn't seem to be a "best-practice" way of doing that, it looks as if there's not a lot of people using Docker with Rails yet, at least not for production purposes. The plethora of information out there is hard to digest. It'd be nice to have zero-downtime deployments out-of-the-box with Fig or Compose.

I guess a suitable way for simple setups is to keep the container running and restart, say, puma inside it (provided it's not being run in the foreground)? Capistrano could be used to orchestrate that, after updating a git repo on the host connected to the container via a volume. Any thoughts?

@aanand any update on this by any chance? @fullofcaffeine solution seems to make sense, if using a process manager within a container. Would be great if there were a few recommended strategies.

We are moving our CI and Alpha environments to docker-compose.

Since we like to release there any commit from any team, we need to do dozens of updates in a single work day, most of them affecting only 1 service.

Restarting 1 service causes a partial, quick downtime, while restarting the whole thing takes about 5/10 minutes.

Since dependency information and update checking (via image pull) are all managed via docker-compose, it is only logical that some kind of selective restart of services with updated images is handled by compose itself.

Is there any way to implement it? (or a better issue to track?)

To perform a real zero-downtime deployment you need a load balancer, and a tool to add/remove backends from the load balancer as nodes are stopped and started. If the load balancer is restarted as part of the deploy you won't have zero-downtime, so it has to be outside of the compose file context and part of the infrastructure.

Since compose doesn't manage any of that infrastructure for you it's not really possible to do a real zero-downtime deployment, without some other tooling.

I think for some cases (like dev and staging environments) what you're looking for is very-little-downtime deployments, which is something we can aim for with compose.

In 1.4.x we made "smart-recreate" the default. This means that a container is only recreated if it changes, or one of it's dependencies changes. In 1.5.0 we added experimental support for the new docker networks, which removes the need to recreate containers when only their dependencies have changed.

In 1.6.0 we should be making the new networking the default, and we can look at doing parallel restarts of all containers, which should make for relative short downtime.

With the current release I would expect only a few seconds of downtime to recreate containers. Can you tell me more about why it takes 5 to 10 minutes?

Some related issues: #1663, #1264, #1035

@dnephin I think compose is fast enough in starting everything from scratch.

The wasted time for us is in restarting ALL the applications inside the containers after just ONE was recently updated. Each app, takes about 10/30 seconds (with high CPU) to initialize.

What I would like to have is a selective restart/rebuild of only those containers that have changes (image, dependencies). This results in a short partial downtime, rather than a full, longer one.
Something like:
docker-compose pull # checks updates for all
docker-compose up -d --smart-restart # restarts all contaners that have newer images, changed dependencies.

What I would like to have is a selective restart/rebuild of only those containers that have change

That logic already exists at the service level. If all the containers for a service have the latest image and config, it won't be restarted (unless one of the dependencies change). It will just say "... is up-to-date. As I mentioned if you use --x-networking it removes the need to recreate when the links change as well.

If you're looking for support at the container level, that was recently requested in #2451

I found using docker-compose pull first saves the reload time and brings it down to a minimal number of seconds (depending on how long your container takes to boot)

I just worked out another flow to have Zero downtime (mainly for web apps).

Proxy

We use jwilder/nginx-proxy to handle routing to app servers, this will assist us in dynamically routing requests to services.

First Deploy

For the first deploy run docker-compose --project-name=app-0001 up -d.

Rolling Update

We edit the docker-compose.yml with the new image id and run docker-compose --project-name=app-0002 up -d. We now have version 0.2 of the app up and running. The load balancer will already begin routing requests, and given we are using nginx LB we will have 0 downtime.

If you need to reach a desired scale you can now run that command to scale up resources before you shutdown the older version.

Now we can do docker-compose --project-name=app-0001 stop to close down the previous deploy. (Optionally we can run an rm to remove the data - but it might be a good idea to only remove this on the next deploy i.e. deploy app-0003 up, app-0002 stop, app-0001 rm).

Truly rolling update

If you have reasoning behind limiting the number of resources running at a given time you could simply stagger the scale. I.e:

app-0001 scale web=10
...
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=6 app-0002 scale web=4
app-0001 scale web=8 app-0002 scale web=2
app-0001 scale web=10 app-0002 scale web=0
...
app-001 stop
Rollback

A rollback is also quite simple, run up on 0001 and stop on 0002.

80/20 deploy

This would be as simple as running scale web=2 on app-0001, and scale web=8 on app-0002.

Automate Everything

This also reminds me of a similar deploy strategy to capistrano, I might even look at using a similar tool to wrap docker-compose and save a rewrite of the deploy logic.

That looks like an awesome way of handling it @alexw23. I did not know about the --project-name param but that looks to cure this problem. Also simple enough to write a wrapper around for your own deployments

@alexw23 Awesome on the "--project-name"!

@alexw23 Thanks, very informative!

_cough_ rancher

Closing this as duplicate of #1786

Thanks for sharing the process, @alexw23. I have a couple of questions I hope you don't mind clarifying (apologies if I'm missing something obvious, I'm not too familiar with docker/compose yet):

  • Would this approach mean you need to have a docker-compose file just for the web components? (i.e. not including nginx-proxy or any data stores)? Otherwise you end up creating more copies of those other components when they are not required (for nginx-proxy in particular this might create a problem since it publishes port 80, doesn't it?). How do you keep things organized / linked in this case?
  • Do you know if/how this works if the container take time to warm-up? Would nginx-proxy start sending requests even if the container isn't ready to serve requests? (e.g. rails takes some time to load... but as far as the container is concerned, it's up and running).

If you are able to share some examples or scripts it would be great, and thanks again for sharing your solution!

Hi guys,

In case someone still needs a rolling upgrade example I came with the following solution on my (rocketchat) application:

for i in `docker ps -f name=<container_name>_* -q` ; do docker-compose scale <service_name>=5 docker stop $i sleep 10 docker rm -f $i done;```

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AvdN picture AvdN  路  3Comments

Hendrik-H picture Hendrik-H  路  3Comments

dazorni picture dazorni  路  3Comments

squeaky-pl picture squeaky-pl  路  3Comments

29e7e280-0d1c-4bba-98fe-f7cd3ca7500a picture 29e7e280-0d1c-4bba-98fe-f7cd3ca7500a  路  3Comments