Multiple services can refer to a volume under the volumes section in the docker-compose file. When the volume does not exist the volume will be created by docker-compose, and data will be copied in from the container(s).
To make this process deterministic, how is it determined which container will be used to copy the data to the volume?
Do I need to use the depends_on key?
Can you add this information to the documentation?
I imagine you could do it using depends_on
rules, but I'd caution against it ; it's a convoluted, non-portable way to do things.
The recommended way to populate a volume is to do so ahead of time using a docker-compose run
(or docker run
) with a command that will provision the data you need.
As far as the docs are concerned, additions should be proposed on the separate docs repo!
The "recommended way" will not work. I have a composition with a "data container" A (holds configuration data) and another container B that is configured with that data. The composition is fired up in a cloud, so I do not know on forehand where they will be scheduled. In the past I used "volumes_from" to configure container B from A. Now I use a volume, but I noticed the initialization of the volume is in-deterministic. Depending on who is first, the volume is initialized from either A or B.
Maybe I should use the nocopy volume-option for container B, to force creation with data from A ?
Another belated comment / question I have on volumes: what is the performance difference between these two approaches when using a large data container? (e.g. 2GB of data).
With volumes_from I assume that the data is directly mounted into the other container. With volumes, the data needs to be copied to the host first.
I am interested as well in the answer to the voluming order question for docker-compose.
I have two services: "webserver", and "MySQL", both instantiated from images that are generated from a different pipeline (and so are outside of my control), and a named volume that is needed to persist the data the MySQL container generates.
The crux of the problem is permissions-related - the MySQL user needs read/write permissions to the volume (obviously) but if the webserver user (which IIRC is root) "wins" the volume attachment game, it will instantiate the server with root, attach the volume as root, and leave mysql unable to write.
I was hoping to use depends_on to ensure the mysql container always "wins" the volume attachment game, but it appears to not work as intended.
In fact, what does seem to work with docker-compose is running the docker-compose run command, watching MySQL fail, and then immediately rerunning it. Then somehow it always works.
I have absolutely no idea why this is happening, and I can't quite ready --verbose mode output effectively enough yet.
I'm planning on going back to orchestrating this pipeline with more verbose but functional raw docker commands if I can't fix the problem with the docker compose orchestration and this named volume soon.
Why is your webserver service attaching a volume meant to store raw data for your database?
That said if permissions are the issue you could always run your webserver service as a different, non-root user.
@shin- unfortunately, it's nontrivial for me to revamp how the webserver container is setting up it's environment and user since the artifact is generated elsewhere and reused in a number of places.
The reason the webserver container needs access to the raw data folder is because it's part of a pipeline doing data processing and the webserver (a rails app) has rake tasks attached to it that sanitize and backup data directly from that data folder.
I also need this feature in my workflow. Atleast we want something like to create the volumes only after the service which defines it explicitly. One use case is using docker volume plugins inside docker-compose file through containers. e.g local-persist plugin via container.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it had not recent activity during the stale period.
Most helpful comment
The "recommended way" will not work. I have a composition with a "data container" A (holds configuration data) and another container B that is configured with that data. The composition is fired up in a cloud, so I do not know on forehand where they will be scheduled. In the past I used "volumes_from" to configure container B from A. Now I use a volume, but I noticed the initialization of the volume is in-deterministic. Depending on who is first, the volume is initialized from either A or B.
Maybe I should use the nocopy volume-option for container B, to force creation with data from A ?