Compose: Losing all data in data volume when container is restarted with Docker Compose.

Created on 19 Jun 2015 · 25Comments · Source: docker/compose

Description of problem:

We are losing all data in a data volume when using docker compose to build and restart containers.

docker version:

Client version: 1.6.0
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 8aae715/1.6.0
OS/Arch (client): linux/amd64
Server version: 1.6.0
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 8aae715/1.6.0
OS/Arch (server): linux/amd64

docker info:

Containers: 67
Images: 2559
Storage Driver: devicemapper
 Pool Name: docker-8:4-3222612005-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 36.9 GB
 Data Space Total: 107.4 GB
 Data Space Available: 70.47 GB
 Metadata Space Used: 99.69 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.048 GB
 Udev Sync Supported: true
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Kernel Version: 3.10.0-229.4.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 12
Total Memory: 15.47 GiB
Name: <removed>
ID: KTFX:ZDDL:IL5M:4JJX:DN32:N2Z2:XQIT:2OBL:GAPT:ZF42:6BSR:LREN

uname -a:

Linux <removed> 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):

VMware virtual machine.

How reproducible:

This problem is intermittently reproducible.

Steps to Reproduce:

-bash-4.2$ d-c stop jenkins && d-c build jenkins && d-c up -d jenkins
Stopping unityci_jenkins_1...
Building jenkins...
...
Removing intermediate container b0a834697b78
Successfully built 0150d8fc1c9d
Recreating unityci_registrydata_1...
Recreating unityci_registry_1...
Recreating unityci_dockerserver_1...
Recreating unityci_jenkins_1...
Cannot destroy container 0ee5103b22ceac0add98adc031c6be1c2e112483f304ed89f7f0b7ec539bf46e: Driver devicemapper failed to remove root filesystem 0ee5103b22ceac0add98adc031c6be1c2e112483f304ed89f7f0b7ec539bf46e: Device is Busy

When I run the up command again, a new container starts.

-bash-4.2$ d-c up -d jenkins
Recreating unityci_registrydata_1...
Recreating unityci_registry_1...
Recreating unityci_dockerserver_1...
Creating unityci_jenkins_1...

But my old container is gone, so all the data in my data volume is gone.

Actual Results:

-bash-4.2$ docker ps -a | grep 0ee5103b22c
0ee5103b22ce        my_jenkins:latest                                                    "/usr/local/bin/jenk   8 days ago          Dead                                                                    
-bash-4.2$ docker ps | grep jenkins
0fefc007a1e0        my_jenkins:latest                                           "/usr/local/bin/jenk   4 minutes ago       Up 4 minutes        ...

Additional Info:

We are using the Jenkins docker image, which creates a data volume using the volume instruction. VOLUME /var/jenkins_home. We are not mounting this data volume from the host. It is internal to the container only.

https://github.com/jenkinsci/docker/blob/1f0d2b7d5b69aed1b0af7916ca46d35b249c1c86/Dockerfile

Source

fmahnke

Most helpful comment

@mangalaman93 Your problem is with /var/run. I'm kicking myself I didn't recognise that quicker, before messing about with Vagrant. On CentOS, /var/run is a symlink to /run which is a tmpfs filesystem, which is forgotten on reboot. Use a different location on the host file-system for any files you expect to persist.

mc0e on 28 Dec 2016

👍3 😕1

All 25 comments

The docker-compose version is 1.2

fmahnke on 19 Jun 2015

Correction. While the data volume is unavailable, it is not deleted. Running docker inspect on the old, dead container reveals the location of the volume, and the data can be retrieved from it.

fmahnke on 19 Jun 2015

Thanks for reporting @fmahnke, I think this is a known problem in the way docker compose re-creates containers in 1.2; docker-compose 1.3 has made improvements in this area, making the recreation step less error-prone. Perhaps you can test if 1.3 resolves this issue for you?

thaJeztah on 19 Jun 2015

Yeah, if there's a transient error when recreating the container then Compose 1.3 should now be resilient to that - running docker-compose up again should fix it.

aanand on 19 Jun 2015

I have the same symptom, but only on one service. I've tried giving it another container with the same volumes and using 'volumes_from' but to no avail, whenever I run docker-compose up -d containername I lose the data.

docker-compose version: 1.3.1
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013

stephanbuys on 3 Jul 2015

@stephanbuys Could you paste a minimal failing docker-compose.yml and the sequence of commands to reproduce?

aanand on 3 Jul 2015

I am having the same issue with docker-compose 1.3.1 and docker 1.7.0.

Using this docker-compose.yml

ghost:
  image: ghost
  ports: 
    - 80:2368
  volumes_from:
    - data

data:
  image: busybox
  volumes:
    - /var/lib/ghost/

I run the following sequence of commands:

root@delta:~/service# docker-compose up -d
Creating service_data_1...
Creating service_ghost_1...

root@delta:~/service# docker inspect service_ghost_1 | grep vol
        "/var/lib/ghost": "/var/lib/docker/volumes/650abc371093c0ebe4c779e960081adbff6e57585e6446ef98c0d4c4fd18bc7f/_data"

root@delta:~/the-nosey-rose# docker inspect service_data_1 | grep vol
        "/var/lib/ghost": "/var/lib/docker/volumes/650abc371093c0ebe4c779e960081adbff6e57585e6446ef98c0d4c4fd18bc7f/_data"

Now, I go to the web ui and interact with it so the db & other config files are created in the volume. Then, I run

root@delta:~/service# docker run --rm --volumes-from service_data_1 debian bash -c "ls /var/lib/ghost"
apps
config.js
data
images
themes

And it is clear that the files are in the volume. Next, I bring the containers up again:

root@delta:~/service# docker-compose up -d
Recreating service_data_1...
Recreating service_ghost_1...

root@delta:~/service# docker inspect service_ghost_1 | grep vol
        "/var/lib/ghost": "/var/lib/docker/volumes/650abc371093c0ebe4c779e960081adbff6e57585e6446ef98c0d4c4fd18bc7f/_data"
            "/var/lib/docker/volumes/650abc371093c0ebe4c779e960081adbff6e57585e6446ef98c0d4c4fd18bc7f/_data:/var/lib/ghost:rw"

root@delta:~/service# docker inspect service_data_1 | grep vol
        "/var/lib/ghost": "/var/lib/docker/volumes/478a617148f193f7904c65394db16ccecf1ff4258ead3a7f0240567c229e09f9/_data"

root@delta:~/service# docker run --rm --volumes-from service_data_1 debian bash -c "ls /var/lib/ghost"

And that final command gives me nothing. It appears that the volumes from the old data container are being remounted in the ghost container, but that the data container is being recreated with entirely new volumes, instead of using the old ones.

I have no idea if this is the expected behavior or not, but since I can't find documentation on it either way, it seems to be a bug in my mind.

rychipman on 14 Jul 2015

Same here with "docker-compose version: 1.3.3"

mc0e on 29 Jul 2015

Right, I've reproduced it. The bug occurs when you specify a trailing slash in the volume path, e.g. /var/lib/ghost/.

As luck would have it, I fixed this by accident yesterday in #1787 when fixing #1785, another bug related to trailing slashes in volume paths. I've added a regression test in #1794.

aanand on 30 Jul 2015

I'm getting it with the following container definition, which has no trailing slash.

mailspool:
    image: busybox
    volumes:
        - /var/spool/postfix

... hmm. or maybe that's to do with wierd symlink arrangements inside the busybox image?

mc0e on 30 Jul 2015

@mc0e Steps to reproduce? Here's what happens when I try:

$ docker-compose --version
docker-compose version: 1.3.3
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1j 15 Oct 2014

$ cat docker-compose.yml
mailspool:
    image: busybox
    volumes:
        - /var/spool/postfix

$ docker-compose up -d
Creating 15762_mailspool_1...

$ docker inspect -f '{{.Name}}: {{.Volumes}}' $(docker-compose ps -q)
/15762_mailspool_1: map[/var/spool/postfix:/mnt/sda1/var/lib/docker/volumes/7a27b7c0ec388c5616cdab092336ea5a922fbf0426939e0db4b8a76b31d856ae/_data]

$ docker-compose up -d
Recreating 15762_mailspool_1...

$ docker inspect -f '{{.Name}}: {{.Volumes}}' $(docker-compose ps -q)
/15762_mailspool_1: map[/var/spool/postfix:/mnt/sda1/var/lib/docker/volumes/7a27b7c0ec388c5616cdab092336ea5a922fbf0426939e0db4b8a76b31d856ae/_data]

i.e. the volume host path is the same before/after the second docker-compose up -d.

aanand on 30 Jul 2015

it's just happened again. Significant data loss. It seems to have been associated twice now with docker version upgrades.

mc0e on 22 Feb 2016

Will you reopen this, or shall I start a new ticket and reference
this?

mc0e on 22 Feb 2016

It's unlikely related to this issue, since it was fixed and we have a test for it.

Please open a new issue with the complete steps to reproduce the problem. Also please note that #2919 is a problem with 1.6.0 which is fixed in master and will be in a bug fix release soon.

dnephin on 23 Feb 2016

When I reboot my machine running, the host directory that is mounted on a container, gets recreated and I lose all of my data. The container is started using docker-compose with restart flag equal to always. I faced this issue on a physical machine as described below. I am able reproduce the issue on a virtual machine as well using a jenkins container and a centos container running tail on a file. I do not observe the issue if I stop and start the container using either docker-compose down/up or docker stop/start <container>. I am running 1.9 version of docker-compose. Further details are as follows -

$ docker info
Containers: 18
 Running: 2
 Paused: 0
 Stopped: 16
Images: 208
Server Version: 1.12.5
Storage Driver: devicemapper
 Pool Name: docker-253:2-268646682-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 16.12 GB
 Data Space Total: 107.4 GB
 Data Space Available: 49.49 GB
 Metadata Space Used: 15.98 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.132 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.135-RHEL7 (2016-09-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay bridge null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-514.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 251.7 GiB
Name: *****
ID: ******
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

$ docker-compose version
docker-compose version 1.9.0, build 2585387
docker-py version: 1.10.6
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1t  3 May 2016

mangalaman93 on 26 Dec 2016

That's excellent news @mangalaman93. With a reproducible test case in a VM this issue might finally get fixed. However, I don't think you've yet provided enough info to recreate the problem. Would you be prepared to put some time into creating a test case that someone else can run on their own system to reproduce the bug? E.g. a shareable VM image, or (better) a vagrant config that sets up a virtual machine, plus a command or command sequence that can be run on that VM to demonstrate the bug.

mc0e on 26 Dec 2016

My docker-compose.yaml file is as follows -

version: '2'

services:
  test:
    image: openjdk:8-jre-alpine
    command: tail -F /var/test/file.txt
    restart: always
    container_name: test
    hostname: test
    volumes:
      - /var/run/centos:/var/test

Run docker container using docker-compose up -d and then modify the file on the host /var/run/centos/file.txt. For example -

echo "This is new line" >> /var/run/centos/file.txt

Now just run sudo reboot. You will observe that after reboot container is still running but file.txt doesn't exist anymore just like when container was first started.

I am not familiar with vagrant and I think this should be enough to reproduce the issue. Please confirm.

mangalaman93 on 26 Dec 2016

OK, I've got a Vagrantfile in a gist at https://gist.github.com/mc0e/e7da8d567e76e7a46d3380c4eb0fde45

# Create a vagrant machine with a docker and docker-compose environment as per mangalaman93 
mkdir /tmp/dc-data-loss
cd /tmp/dc-data-loss
curl  -O https://gist.githubusercontent.com/mc0e/e7da8d567e76e7a46d3380c4eb0fde45/raw/0327c018684140fed711c660f9cfa52f22c6c6c8/Vagrantfile
vagrant up

# Test file persistence
vagrant ssh -c 'ls -l /var/run/centos'
vagrant ssh -c 'sudo reboot'
vagrant ssh -c 'ls -l /var/run/centos'

I'm not entirely convinced that this is the same as the case I saw, where my data volumes did survive reboots, but did not survive during several docker upgrades from apt. I'm not even sure that this one is docker-compose at fault, as opposed to docker.

That said, neither this case, nor mine, and apparently also not the OP's case match the description of the bug that Anand reported as fixed.

Can we please have this case re-opened. It's been a show stopper for docker-compose for over a year now.

mc0e on 28 Dec 2016

I've created another Vagrantfile demo at https://gist.github.com/mc0e/57855d830edfb990ab4ec792dddf8b7c which demonstrates the behaviour @mangalaman93 describes with docker only, and without even installing docker-compose

mc0e on 28 Dec 2016

Actually, it seems I can create this problem without using docker either. Either it's a VirtualBox thing, or perhaps its specific to how /var/run is managed on reboot.

That's not to say that there's necessarily anything wrong with the report from @mangalaman93, just that I haven't yet demonstrated it. I note that @mangalaman93 reported seeing this on a bare metal machine. I'll first investigate the /var/run side of things, then if necessary try using a different provider from vagrant.

mc0e on 28 Dec 2016

@mangalaman93 what sort of VM were you running?

mc0e on 28 Dec 2016