Mysql: How to Populate database at container creation

Created on 12 Jan 2015  路  19Comments  路  Source: docker-library/mysql

I have the container running, all the MYSQL_ parameters worked, the database was created and the user was defined and granted access to the database.. running in daemon mode

the database is empty as expected
I want to execute the sql file that contains all the DB schema commands (create table, ....)

I can create a script and run docker exec mysql,
IF I did -v to map some host volume into the container when I started it

is this the proper way?

in my script, how do I wait til the daemon container is up and running?
try docker exec repeatedly til it works? (shouldn't work til the container CMD is running)

I have also created my own image which does all the same things this does and adds support for database population, but it makes no sense to manage my own image.

Most helpful comment

I've been thinking about this again and need to clarify my particular use case: I want to make a populated database available mindful of the fact that is is likely to be a _read only_ database. The use case is distributing a dataset that can be queried from an application in another container (an R or python client, for example).

Rather than publish the dataset as a sql export or set of csv files, and then expect the user to build the database so they can then start to query it, I want to make the data avaialble in as quick to use way as possible.

My understanding is that it is not easy to distribute data volumes. I also want to minimise the linking that a user needs to do.

My thinking was that if I load a mysql database container with the (readonly) database stored locally within it, then all I would have to do is create eg a client container and then link it to the (populated) database container?

Essentially, my interest is in distributing (read only) datafiles within a database application that makes them queryable.

Here's a particular use case: http://blog.ouseful.info/2015/01/17/connecting-rstudio-and-mysql-docker-containers-the-ergastdb/

I think I could figure out how to write a fig script to do the last part - fire up RStudio and connect it to the MySQL container, but how would I get the MySQL database up and running and - if it isn't already populated with data - populate it from within the same fig script? (It would be easy if the database container came with the data baked in?)

All 19 comments

I have been using these helper python scripts to create and populate containerised databases: http://www.luiselizondo.net/a-tutorial-on-how-to-use-mysql-with-docker/

If your SQL file loads in data from a local file or directory, you need to add it to the path /tmp/import/

LOAD DATA LOCAL INFILE '/tmp/import/MYDIR/MYDATA.txt'

Note that there appears to be a memory leak when deleting containers - the data is not disposed of. The Rabbit Hole of Using Docker in Automated Tests - A garbage collection bug in Docker suggests that the data may be being loaded into a separate container that is not garbage collected?

suggests that the data may be being loaded into a separate container that is not garbage collected?

No, it's not a memory-leak, it's by design.

The MySQL container is using a "volume" to store persistent data. To see where the volumes of a container are located, use docker inspect [container], in the output (JSON), you'll find the information.

If you docker rm [container], only the container itself is removed, but the _volumes of the container_ will not be removed. This allows you to access (backup?) data stored in those volumes (in this case, the database-data), even after you have removed the container.

If you want to remove a container _including_ its volumes, use the -v flag; docker rm -v [container]

If you have removed a lot of containers in the past without removing their volumes, you can see (and remove) them, using this utility: https://github.com/cpuguy83/docker-volumes. The functionality of docker-volumes may become integrated in docker itself in the future (see https://github.com/docker/docker/pull/8484), but no decision has been made on that yet.

psychemedia

loading the sql file into the container then requires deleting it.

I hadn't see the docker VOLUME command until after I had asked this question. I agree that in a production environment you would want to separate the data from the sql engine..
so, maybe you create the database (first time) on the host then point MYSQL to it thru the VOLUME and docker -v commands. so you don't need to have the database created or populated by the container itself.

@thaJeztah Thanks - that makes a lot of sense. I hadn't picked up on that at first. And thanks for the link to the docker-volumes utility.

You're welcome! Thought it would be best to respond quickly before all kind of wild rumors start spreading.

Kind request (for the same reason); if you are the author of the blog post you linked to, could you please update it to reflect this information? thanks a lot in advance!

@thaJeztah I'm not the author of that post will I will post a comment to it.

That would be great, thanks!

To further clarify: is there a way of starting the mysql container so that the the data is mounted _within_ the mysql database server, such that it can be populated with data and then the container exported such that it contains the data?

@psychemedia not directly, but this may give you some pointers: https://github.com/docker-library/mysql/pull/42

Keep in mind that running a database with the data _inside_ the container (not a volume) is discouraged;

  • the container's filesystem is a layered/copy-on-write filesystem. This is _very_ bad for performance with files that are frequently changed. Also, permissions can be troublesome (e.g. When running on AUFS)
  • if your container is deleted, the data is deleted as well.

I've been thinking about this again and need to clarify my particular use case: I want to make a populated database available mindful of the fact that is is likely to be a _read only_ database. The use case is distributing a dataset that can be queried from an application in another container (an R or python client, for example).

Rather than publish the dataset as a sql export or set of csv files, and then expect the user to build the database so they can then start to query it, I want to make the data avaialble in as quick to use way as possible.

My understanding is that it is not easy to distribute data volumes. I also want to minimise the linking that a user needs to do.

My thinking was that if I load a mysql database container with the (readonly) database stored locally within it, then all I would have to do is create eg a client container and then link it to the (populated) database container?

Essentially, my interest is in distributing (read only) datafiles within a database application that makes them queryable.

Here's a particular use case: http://blog.ouseful.info/2015/01/17/connecting-rstudio-and-mysql-docker-containers-the-ergastdb/

I think I could figure out how to write a fig script to do the last part - fire up RStudio and connect it to the MySQL container, but how would I get the MySQL database up and running and - if it isn't already populated with data - populate it from within the same fig script? (It would be easy if the database container came with the data baked in?)

@psychemedia, that seems fair to be able to do, though I don't know of a way to enforce the readonlyness of the database (besides it being a container that is ephemeral by nature). Currently if you add a custom config that changes the datadir, the startup script would still assume the regular dir and initialize the database there and mysqld would fail to start, we are working on a solution to this right now. Once we are finished you should be able to COPY a custom conf that changes the datadir to something other than in /var/lib/mysql and be good to commit all your data to a docker image.

And I have similar use cases, where the data is only valuable during a particular timed scenario.
data gets put into the database inside the container, used by the linked containers, and when the scenario is finished (hours/days...) the containers are all discarded.
There may be multiple of thes scenarios occurring concurrently, and the data in a container gives us easy isolation.

@sdetweil, as long as you docker rm -v containerId the database containers, the volumes will go away too.

@psychemedia did you have a 'recipe' for your Dockerfile to satisfy your use case? I am doing the same thing with redis and can't see to keep the rdb file in the container at build time.

@hekaldama I think I ended up with this - https://blog.ouseful.info/2015/07/26/exporting-and-distributing-docker-images-and-data-container-contents/ - though I've no idea if it makes sense!

@psychemedia thanks! I found out that I was trying to stick data in a path that my FROM was defining as a VOLUME which resulted in 0B all the time. I changed my WORKDIR to my own defined space and can make my data container now.

Sorry for this post, but i have a problem:

i want to create a mysql docker container and then create into it a schema. How could i do?

@Pierpaolo1992 the GitHub issue tracker is not the right location for that question; the GitHub issue tracker is not intended as a general support forum, but for reporting bugs and feature requests. For other type of questions, consider using one of;

Run the following command to cleanup existing volumes which should resolve the issue:
docker volume rm $(docker volume ls -qf dangling=true)

Was this page helpful?
0 / 5 - 0 ratings