Provide docker images
docker run --rm -t -v $PWD:/repo dvc status
)@casperdcl , I'm not sure how much benefit would bring, giving that it is not that hard to create a docker image for DVC: docker run --name dvc -ti python pip install dvc
&& docker container commit --change 'CMD ["dvc"]' dvc dvc
Also, testing this would be the same as testing DVC on a linux machine (or at least, it should be)
I doubt that users want to use it as a base image, since it doesn't provide anything else than DVC.
I'd prefer to not maintain this one, to be honest
I would also argue, that when doing some data science stuff in docker, its probably easier to install DVC in your own image, rather than try to adjust DVC image to you requirements (like installing TF/pytorch/...).
@pared that was my point about
- also would make it easier for end-users to incorporate into their own docker images
as in people could copy-paste from our docker file into theirs...
I assumed it's more complex than just a pip install
to get all the features (at least some apt-get installs
as well?)
@casperdcl ok, I didn't quite get it.
as in people could copy-paste from our docker file into theirs...
That would surely help to build custom image.
@casperdcl Hm, apt-get
s would depend on the base image that you are using. If it is absolutely bare then yeah, you will need to install python and whatnot. But for regular things like ubuntu you don't have to install anything special, except, I guess, git. We have a docker image that we are using for testing here https://github.com/iterative/dvc-test/blob/master/docker/ubuntu/16.04/Dockerfile . Not sure if we still need to install libffi explicitly :thinking:
@efiop the setup.py
file seems to imply more deps are required - I was referring to a full pip install dvc[all,ssh_gssapi,tests]
@efiop can confirm libffi
not required on ubuntu:18.04
@casperdcl Ah, got it. yeah, gssapi
requires some dev tools to compile stuff as well as tests
, so the dockerfile for that would be more complicated. We could provide it in the docs maybe? I'm not quite sure about building and testing it. When used for regular tests, it would always run linux inside, instead of running on the host system, which is a problem, as we already have a lack of native-testing on windows and osx, and that might prevent us from discovering bugs when developing. Though a nice thing about it is that you don't have to setup a test env on your machine, that is true :slightly_smiling_face:
Yes I did start making a few flavours of docker images for testing (alpine, ubuntu LTS, 2.7, 3.6 etc) ages ago which are probably sitting in a git stash
on one of any number of machines. maybe.
Now I just use a conda env for dvc testing.
btw totally fine with this issue being closed - don't actually have any strong opinions about it.
@casperdcl No reason to close it. Docker images(or at least dockerfiles) would be nice to have, for sure. 馃檪
Provide docker images
I just came here looking exactly for that.
Every time I see a comment like:
"It's highly recommended using virtual environment or pipx (on Python 3.6+) to encapsulate your local environment."
I instantly go looking for a docker file for me to easily test the software.
That is because I use different software (mostly R), and don't use python outside docker (because python environment libraries change a lot, and nobody seems to agree which one is best - conda, virtualenv, pyenv, pipenv etc - which, to complicate further, have different functionality).
@nettoyoussef Thanks for the feedback! Do you need pre-built images, or a Dockerfile in our docs would do?
@casperdcl thank you for this idea!
I agree with @pared and @mroutis that it is easy to create your own docker image and it might create additional supporting overhead for us.
However, prebuild docker gives value to users and @nettoyoussef showed some example. It can attract users' attention and improve usability despite a simple implementation.
But then documentation plays the major role. Can we make a good documentation page or even a small blog post that explains the motivation behind using docker image instead of installed tool and when it is needed? Why don't we start with doc/blog-post and then implement the docker image.
@nettoyoussef Thanks for the feedback! Do you need pre-built images, or a Dockerfile in our docs would do?
Thank you for being so helpful.
Personally, a Dockerfile would suffice. The community, however, maybe would benefit more from a pre-built image.
Instead of building one from Ubuntu, you could make your life easier and, e.g., start from a miniconda image. I think this can be easy to automate, and maybe you can even delegate this to other teams - a partnership with rocker for example.
Since from what I read DVC
is not attached to any particular library/language it also makes sense to separate concerns, i.e., you can use the same DVC image with any project, instead of installing it with any particular environment.
It also makes easier to implement it in existing projects - since you don't have to rebuild the images just to try it.
Here's an initial draft to add to DVC's docs a section about using Docker: https://github.com/iterative/dvc.org/pull/811
Feel free to edit it accordingly :slightly_smiling_face:
After several iterations trying to provide useful information for Docker users, there's no agreement in what should be that info and the way we should present it (e.g. provide an image? docs are enough?)
Let's keep the discussion open for now :)
Thanks a lot, @casperdcl , @shcheklein , @jorgeorpinel , @efiop for reviewing the previous efforts.
If you could dump your opinion on this one it would help a lot to reach a conclusion.
Well, like you said in https://github.com/iterative/dvc.org/pull/811#discussion_r350316256
The only essential parts are
FROM python
andRUN pip install dvc
So I don't see the point of providing such a simple Dockerfile that anyone familiar with Docker can easily create. Maybe just a small section in the installation guide to provide Docker tips such as using python:3.7
and dvc[all]
.
There are two DVC-docker images for CI/CD project which are going to be maintained:
The docker files code is here (gpu PR is not merged yet): https://github.com/iterative/dvc-cml
Does it make sense to extend these images to cover the needs of this issue? What needs to be added or changed in the images?
maybe I'm missing something but it looks like they're using index.js
(node
) rather than the Dockerfile
(docker
). Seems like the GH Action should really use the docker image (e.g. https://github.com/casperdcl/covid-19-box).
On a related note I like where this is heading https://github.com/iterative/dvc-cml/wiki/Tensorflow-Mnist-for-Github-Actions
@casperdcl it is using docker files. Index.js is here just support GH users who don鈥檛 want or cannot use docker. You can find it in the workflow files.
Right. Seems a bit odd to provide a nodejs action for public use via the standard uses:
syntax, but in our workflow use our own docker version. Unrelated to this issue (#2774) though.
@casperdcl what would be your suggestions for that project? How to organize it in the right way?
action.yml
, use Dockerfile
entrypoint
itself can use node and/or any other software that users demand support foruses: ./
to run/test the (docker) actionAdvantages:
Surely should discuss this in an issue on that repo though?
@casperdcl
Move as much of the other non-essential root clutter to subdirs
you mean prettier configs, etc, etc?
The Dockerfile's entrypoint itself can use node and/or any other software that users demand support for
could you elaborate?
uses: ./ to run/test the (docker) action
same here, could you elaborate?
you mean prettier configs, etc, etc?
er, just a general principle removing as much as possible. Some tools expect files to be in the root so we're mostly stuck there, ofc.
you mean prettier configs, etc, etc?
It's cumbersome for us to maintain multiple, well, entrypoints to our actual code. If we want to support both docker
and directly running in node
, it's best to have docker be a thin wrapper around node
(i.e. in the Dockerfile, use ENTRYPOINT npm
, CMD run
or similar).
This way we can use the docker wrapper for the action. Thus running the action will test our docker wrapper as well as the underlying entrypoint. The additional advantage is that all deps are guaranteed installed in the docker container.
I feel that I'm still missing something :)
Docker entrypoint for the image we provide already does this, right? It already runs Node. And image itself has JS bundle pre-installed.
There are no very strong reason to support direct docker-less action, but it's a separate topic.
yes I was making several minor points, I think we're all missing small things but nothing major :)
Hi! we are pushing to use the code through Docker, the main reasons are:
The MAIN reason why the js action is maintained is because there is no way MACOS or WINDOWS can run specific native tools in docker. So if a user would be using i.e. CoreML with Xcode the only way to make this work available for them is through the purely Github Js action and only in Github
Most helpful comment
I would also argue, that when doing some data science stuff in docker, its probably easier to install DVC in your own image, rather than try to adjust DVC image to you requirements (like installing TF/pytorch/...).