Singularity: Singularity Docker bootstrap has no understanding of WORKDIR

Created on 2 Dec 2016  路  40Comments  路  Source: hpcng/singularity

If we bootstrap a Docker image, we would want the Singularity container to respect any WORKDIR that is defined in the Dockerfile. This is important because sometimes the running or script executables might have things relative to that path. If we were going from a Dockerfile that would be pretty easy to find, but since we are going from layers we likely need to dig into the MANIFEST and other places (environmental variables that Docker knows something to do with, and Singularity should to?) so we get the same functionality (eg, when a user shells into a Singularity container that was bootstrapped from a Docker image with the last WORKDIR as /code, it should be in /code. The user as a workaround can edit the runscript to first cd here, but arguably it should just work as expected.

Enhancement

Most helpful comment

I want to re-open this and suggest that WORKDIR (likely called something different) be added as an optional flag for Singularity to honor any working directory defined by the container creator.

All 40 comments

I think I can fix this behavior. It should be possible to inject the code necessary to do this into the entrypoint files (/.exec /.shell /.run). If that works properly, it can just be done in C

@bauerm97 that sounds cool! Right now we are pulling the manifest with the python portion - is there plan to move that to C? Did you have something in mind where the python bit write it (somewhere) and then you handle it after in C? Let me know your idea!

This is working now. Reopen if I am mistaken please.

This isn't fixed, to explain, when you do WORKDIR in Docker that means that (during build from Dockerfile) the install daemon guy moves around depending on the directory set (and you use this instead of cd /location). Where we run into bugs is when a user finishes in a WORKDIR (let's say it has some executable for their superawesome.sh and then their running command might look like [`"/bin/bash","superawesome.sh"]. This will work fine with Docker, because it will remain / keep an understanding of the working directory that the container ended up at. It doesn't work for Singularity because, by default, we start at the root of the image.

The way I've been working around it is to advise to always put full paths to everything. However, there are definitely Docker images that don't do this. In the case of the user not having control, we can then advise to define a custom %runscript to override Docker.

Anyway, it's not an essential feature, but it's an issue that we should keep open to see if other users have trouble. If there is enough trouble, then we can address!

If there is enough trouble, then we can address!

@vsoch Has there been enough trouble? Should we address or close?

Given that singularity containers can specify --pwd and aren't isolated from the host like Docker, I think the concept doesn't map as I was originally thinking, so yes let's close. I don't see any compelling use cases for which you couldn't get the functionality you would want without it.

wubba lubba dub dub!

Given that singularity containers can specify --pwd and aren't isolated from the host like Docker, I think the concept doesn't map as I was originally thinking, so yes let's close. I don't see any compelling use cases for which you couldn't get the functionality you would want without it.

I do see some reasons that could justify the presence of a WORKDIR feature as in docker:

  1. Even though singularity is mounting by default the host filesystem, in several cases users are running isolated containers with arbitrary paths.

  2. The --pwd option supposes the user to know the internal structure of the container and that is not necessarily true when the image is created by someone else.

The way I've been working around it is to advise to always put full paths to everything

  1. In many cases programs are just considering relative paths since they can be installed in different paths by different users. Consider a container running a third party program located in /opt/thirdpartyprogram.
    The starting script is /opt/thirdpartylibrary/entrypoint.sh and it is supposed to be executed in /opt/thirdpartyprogram because it requires some files from $PWD.
    Executing /opt/thirdpartyprogram/entrypoint.sh is not equivalent to cd /opt/thirdpartyprogram; ./entrypoint.sh.

I think they are legit motivations to don't force the user to work by default in the current directory or in / if the host file system is not binded. As for every feature, it would be good to offer it and let the final user decide whether to use it or not.

@pierlauro I can definitely see these use cases. But how would you suggest to untangle the two cases of respecting the user's actual $PWD (a reasonable use case for many instances) vs. honoring some preset WORKDIR from the Dockerfile? It sounds like the situation you are suggesting that is most relevant is when some kind of --containall is used so that no mounts (including $PWD) are done - is this what you had in mind? Another idea is an option that a user could specify (in any runtime use case) to honor the previous working directory of the container, but this would be a custom Singularity thing (not represented in any OCI spec) so I'm not sure it fits.

I would give the possibility to specify in Singularity a workdir parameter at build time equivalent to the docker one.

Then to add a -workdir flag to exec/run/shell: if used, the entrypoint's directory is the build's workdir.

No matter if the container is isolated or not, the flag should be provided in both contexts. I guess my proposal is less strict than your idea of --containall.

Hmm I believe singularity already has a -workdir argument?

    -W|--workdir        Working directory to be used for /tmp, /var/tmp and
                        /home/vanessa (if -c/--contain was also used)

So you are saying this should be possible to provide (without arguments) and perhaps default to some saved WORKDIR from the converted image? Would it then make sense to allow a (non Docker imported) container to also provide this specification, from a Singularity recipe? And the default (given no definition in the container) would be an error?

Hmm I believe singularity already has a -workdir argument?

Sorry, I didn't notice there was already an argument with the exact same name.

So you are saying this should be possible to provide (without arguments) and perhaps default to some saved WORKDIR from the converted image?

Exactly.

Would it then make sense to allow a (non Docker imported) container to also provide this specification, from a Singularity recipe?

Yes, my proposal was to introduce it also a a recipe's parameter.

And the default (given no definition in the container) would be an error?

If no other option is specified, the default working directory should be the one you're in.

For docker, you can obtain this information just by running the container (given the isolation and not specifying a different working directory:"

$ docker run --entrypoint pwd vanessa/salad
/go/src/github.com/vsoch/salad

From the manifest, it looks like either of these are valid (but should look up the difference)

$ docker inspect --format="{{.Config.WorkingDir}}" vanessa/salad
/go/src/github.com/vsoch/salad/
$ docker inspect --format="{{.ContainerConfig.WorkingDir}}" vanessa/salad
/go/src/github.com/vsoch/salad

Yes, but if the user specifies --workdir without an argument and expects to use some custom working directory defined by the container, shouldn't he/she at least get a warning that the directory thought to be there is not? I think you are right - we probably don't want to error out (the user should be able to apply the flag to a container rather blindly and choose to honor any preset working directory) but at least be alerted about if it's found (or not.)

How do you see it fitting in a recipe? If we mirror the Dockerfile practice, you would have it appear somewhere in the %post. Maybe something like the final definition of SINGULARITY_WORKDIR or something like that?

And here is a really good example of why we would want something like this - I just built this singularity container from a docker container, and running it would normally run a script in the pwd (shown above) but look what happens when I don't know this:

$ singularity run $TMPDIR/chimichanga.simg fork
/.singularity.d/runscript: line 2: ./salad: not found

Ruh roh! If I didn't build the container, I'd be done here. I would need to do this:

$ singularity run --pwd /go/src/github.com/vsoch/salad $TMPDIR/chimichanga.simg fork

 You're done!  

                       /\
                      //\\
                     //  \\
                 ^   \\  //   ^
                / \   )  (   / \ 
                ) (   )  (   ) (
                \  \_/ /\ \_/  /
                 \__  _)(_  __/
                    \ \  / /
                     ) \/ (
                     | /\ |
                     | )( |
                     | )( |
                     | \/ |
                     )____(
                    /      \
                    \______/ 

So instead of the above, we should be able to (given this image is generated with docker2singularity, or generally from docker) (and the flag can be named something different)

$ singularity run --workdir-def $TMPDIR/chimichanga.simg fork

And if I were building from a recipe file:

%environment
    SINGULARITY_WORKDIR=/go/src/github.com/vsoch/salad
    export SINGULARITY_WORKDIR

If there wasn't a working directory specified, continue with warning:

$ singularity run --workdir-def $TMPDIR/chimichanga.simg fork
WARNING: This container does not define a working directory. Using ${PWD}

or something along those lines.

For docker [...] From the manifest, it looks like either of these are valid (but should look up the difference)

$ docker inspect --format="{{.Config.WorkingDir}}" vanessa/salad
/go/src/github.com/vsoch/salad/
$ docker inspect --format="{{.ContainerConfig.WorkingDir}}" vanessa/salad
/go/src/github.com/vsoch/salad

As long as docker-inspect is executed on an image name (e.g. vanessa/salad) they are equivalent.

The Config.WorkingDir can change if the docker-inspect is executed on running or dead instances but it's not our case.

Maybe something like the final definition of SINGULARITY_WORKDIR [...]?

This sounds good to me and the warning totally makes sense.

Scheweet! Let's see what the maintainers think and go from there. Just to be clear, the manifest inspection would happen with a call to the registry endpoint (and the above is just to show where to find the WorkingDir within the config manifest.)

Ruh roh! If I didn't build the container, I'd be done here. I would need to do this:

Actually in docker2singularity this problem is solved by #https://github.com/singularityware/docker2singularity/pull/34 . And yes, you got the essence of my point.

It's not totally solved with that PR - even if you are able to derive the working directory there, the solution there forces use of it (and we need this to be more flexible). However, your point taken that we technically could add hacks to docker2singularity, and this would be a last resort option if it were simply not doable here. My preference is to not make those containers generated with docker2singularity somehow special, and for a function that is generally needed (regardless of having been a docker container) I think the change belongs here.

I want to re-open this and suggest that WORKDIR (likely called something different) be added as an optional flag for Singularity to honor any working directory defined by the container creator.

Any update in regard?

@pierlauro we had good discussion, and one of the maintainers needs to step in and respond. I still +1 that it's a reasonable suggestion.

+1 as well

OK. So I just got finished reading through this thread and I think I understand the gist of it.

Based on the discussion it seems like this feature is requested as a matter of convenience. In other words, pulling and running a Docker container that makes use of WORKDIR in Singularity does not break the container. It just means that the user may need to figure out the location of files to use the container after they pull it.

On the one hand, I think it would be nice if this just worked as expected. But on the other hand, I object to adding code to try to mimic the behavior of Docker and in particular to adding a (potentially confusing) CLI option. Singularity and Docker are different. That is by design. Singularity is not meant to be a drop-in replacement for Docker. I don't really think we should be writing extra code and adding options to replicate the way that docker behaves just for convenience.

That's my $0.02.

@GodloveD much of the recipe conventions, pull functionality, and inspect was based on Docker, it isn't a bad thing to provide support for features that are needed. If I remember correctly I had to really fight for pull because it wasn't wanted, but the users really needed it.

The WORKDIR is a big deal because for a substantial number of containers, they assume that it is honored, and do something like targeting a script (e.g., run.py in the WORKDIR. It could be the case that the user doesn't have the knowledge or ability to figure out where (really hidden, it is) in the container to direct the command, and it really adds unneeded errors and bugs when that doesn't need to be the case. In the same way that we honor environment variables, it's reasonable to treat this the same. It's a piece of metadata that could in fact be stored as an extra environment variable, and not used by default, but instead giving users the option to specify a flag to use it, or even no flag but just knowledge that they can find the variable via the environment. Storing one more environment variable to ward off these issues is a very trivial and easy fix, an easy win. Singularity owes a lot of its success to being friendly with docker, and this is no different. I would propose:

  • checking for last WORKDIR defined in the container
  • export to DOCKER_WORKDIR or similar

Minimally, then the user could know where the container expects the $PWD to be when it shoots out an ugly, broken error message, and do this:

singularity exec container.simg env | grep WORKDIR

I'll let others weigh in on why this is important. Those are my $0.05.

I'm going through old issues and trying to clean them up. This has gotten very stale. I guess if there is still interest/need for this, feel free to request it to be re-opened (or better yet, send a PR!).

For user's that find this issue and want support for working directory from docker, see https://github.com/singularityware/docker2singularity/blob/master/docker2singularity.sh#L250.

As I know, it is supported now natively by Singularity since we embed OCI config into SIF.

Can you give an example?

@vsoch My bad, Cedric confirmed it is respected in OCI engine only.

No worries! it's a really hard issue to find consensus between Singularity and Docker, because one of the features of Singularity (seamless interaction with the host, respecting the present working directory) goes strongly against the stringent isolation provided by Docker. It's a direct conflict to expect a container to both honor a $PWD or --pwd, and also change the user to some set location. A lot of Docker containers do expect a particular runtime directory (and thus break when converted to Singularity) so I can see how the conversion isn't perfect. On the other hand, a well made Docker container would not depend on relative paths and (theoretically it shouldn't matter). I added it to docker2singularity because I realized it wouldn't really make sense to add here, so hopefully that can satisfy most parties.

I am encountering this issue where "WORKDIR" and relative paths are defined in a series of docker files and singularity 3.2.1 doesn't respect the relative paths and the original WORKDIR.
Any help to resolve this issue would be appreciated.
I played with --pwd and -W options from singularity exec but they don't reproduce the same results. At the same time, hardcoding the paths to a new singularity recipe mitigates the power of converting a docker file to singularity...

I see https://github.com/singularityhub/docker2singularity/blob/master/docker2singularity.sh#L250
in the comments but I am not sure how to apply it to our existing singularity build 3.2.1 on the cluster?

Is this only for singularity 3.5>= or can it be applied to 3.2.1?
Meanwhile trying to build 3.5.2 (latest), I encountered golang 1.13.5 vendor issues:
go: inconsistent vendoring in /programs/local/go/1.13.5/go/src/github.com/sylabs/singularity:
go.mod requires github.com/sylabs/singularity but vendor/modules.txt does not include it.

Appreciate any help/suggestions

Per all the discussion above, I don't think that Singularity will support WORKDIR any time soon, so that's not an option. In these cases, the best bet is to use --pwd, which you've mentioned trying (and it doesn't work). Could you share the container from Docker Hub / exec commands to reproduce? It could be that we need to do some special bind to get the functionality you want (for example, if you are binding a --pwd that is already bound somewhere else, it will skip).

As for docker2singularity, you are correct that it would require the Docker daemon (and thus not work on the cluster!) But here is an idea - have you tested podman? https://podman.io/ It's supposed to be a plug in replacement for Docker, but doesn't have the daemon issue. If you are able to do a security overview and decide if it would work for your cluster, then we could implement a podman2singularity that actually could work on your cluster! Let me know your thoughts, happy to help however I can :)

Thanks a lot for your very quick response. Very helpful.
I have just begun this with a very simplified demo of our key projects called "ChRIS"
https://github.com/FNNDSC/CHRIS_docs
https://www.youtube.com/watch?v=7WIGC1VjLqY&t=6s

The docker file for my preliminary tests is located
https://github.com/FNNDSC/pl-freesurfer_pp_moc

With your hint, I was able to finally make it work:

Docker:
docker run --rm -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing \
fnndsc/pl-freesurfer_pp_moc freesurfer_pp_moc.py \
/incoming /outgoing

Singularity:
(after buolding the singularity image)
singularity exec -B in:/incoming,out:/outgoing --pwd /usr/src/freesurfer_pp_moc pl-freesurfer_pp_moc.sif python freesurfer_pp_moc.py /incoming /outgoing

The above produces equivalent results. So that's good. What remains is that unfortunately, one needs knowledge of the WORKDIR by looking at the dockerfile to make the --pwd arrangement. Looking at ways to make this more convenient based on the series of dockerfiles and patterns,
I can see that at least on this dockerfile, $APPROOT is defined as the WORKDIR. Would that be possible to "pass" $APPROOT as the --pwd in the commands above?

As you may see, the pipelines are currently being executed on a very few openshift instances on https://massopen.cloud/ (Docker with root enabled I assume, based on the nature of the files). My hope is to make a smooth branch for the entire pipeline and enable "scaling" of the pipeline on clusters with the power of singularity!
From my recent conversations with Redhat, openshift doesn't recommend root-enabled docker but considers relaxing securities to get the containers through. At the same time, openshift doesn't support singularity.
All in all, I have real interest to make this work with singularity and scale on our internal HPC cluster on step 1.
Thanks for your help and hope to cooperate more in future.

It's great that you got it working! Another helpful bit that might be useful to you in the future is to use --containall, which would prevent the usual binds from happening (making the execution environment more isolated akin to Docker). If you needed to isolate the environment too (maybe less likely) there is --cleanenv.

You are correct that you would need to know what to specify as the WORKDIR, and really the best suggestion I have is to provide it in the help text / entrypoint of the container. If you can't edit that, then really good documentation for your users would work. I'm not sure if it helps, but here is a way to get the PWD inside the container (with docker):

$ docker run -it --entrypoint /bin/bash snakemake/snakemake -c "env | grep PWD"

Hmm, if you are using OpenShift (RedHat) you should really look at Podman (also RedHat) :)

Thanks for the additional suggestion.
We will go with documentations for now and see how that works for the users
https://github.com/arashnh11/pl-freesurfer_pp_moc

We will also review the Podman when we get a chance.

This is a legitimate build issue. I used the following Dockerfile:
https://github.com/NVIDIA/DeepLearningExamples/blob/31ca062d9399e28109ba901a1842b9eb7afa5989/PyTorch/Detection/SSD/Dockerfile

ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3
FROM ${FROM_IMAGE_NAME}

# Set working directory
WORKDIR /workspace

ENV PYTHONPATH "${PYTHONPATH}:/workspace"

COPY requirements.txt .
RUN pip install --no-cache-dir git+https://github.com/NVIDIA/dllogger.git#egg=dllogger
RUN pip install -r requirements.txt
RUN python3 -m pip install pycocotools==2.0.0

# Copy SSD code
COPY ./setup.py .
COPY ./csrc ./csrc
RUN pip install .

COPY . .

Then converted it to a singularity recipe using spython via:

spython recipe Dockerfile Singularity_ssd_generated.def

Resulting recipe:

Bootstrap: docker
From: nvcr.io/nvidia/pytorch:20.06-py3
Stage: spython-base

%files
requirements.txt .
./setup.py .
./csrc ./csrc
. .
%post
FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3

# Set working directory
cd /workspace

PYTHONPATH="${PYTHONPATH}:/workspace"

pip install --no-cache-dir git+https://github.com/NVIDIA/dllogger.git#egg=dllogger
pip install -r requirements.txt
python3 -m pip install pycocotools==2.0.0

# Copy SSD code
pip install .

%environment
export PYTHONPATH="${PYTHONPATH}:/workspace"
%runscript
cd /workspace
exec /bin/bash "$@"
%startscript
cd /workspace

This resulted in a mess. It was not copying files to /workspace and the environment was all messed up. I fixed it up like this:

Bootstrap: docker
From: nvcr.io/nvidia/pytorch:20.06-py3
Stage: spython-base

%files
    requirements.txt /workspace/
    ./setup.py /workspace/
    ./csrc /workspace/csrc
    . /workspace/

%post
    FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3

    # Set working directory
    cd /workspace

    PYTHONPATH="${PYTHONPATH}:/workspace"

    # SINGULARITY_ENVIRONMENT: /.singularity.d/env/91-environment.sh
    SINGENVDIR=$(dirname ${SINGULARITY_ENVIRONMENT})
    source ${SINGENVDIR}/10-docker2singularity.sh

    pip install --no-cache-dir git+https://github.com/NVIDIA/dllogger.git#egg=dllogger
    pip install -r requirements.txt
    python3 -m pip install pycocotools==2.0.0

    # Copy SSD code
    pip install .

%environment
    export PYTHONPATH="${PYTHONPATH}:/workspace"

%runscript
    cd /workspace
    exec /bin/bash "$@"

%startscript
    cd /workspace
    exec /bin/bash "$@"

The main fix is explicitly copying files to /workspace and sourcing the environment /.singularity.d/env/10-docker2singularity.sh (some magic environment file there) for the container.

    SINGENVDIR=$(dirname ${SINGULARITY_ENVIRONMENT})
    source ${SINGENVDIR}/10-docker2singularity.sh

Then it worked.

IMHO, the build recipes should be smarter. There should be some API of setting a directory in the %files section and the container environment should be optionally inherited prior to running the post section.

IMHO, the build recipes should be smarter. There should be some API of setting a directory in the %files section

Since the destination for files can already be set, the simplest thing here would likely be for spython recipe to recognize the WORKDIR in the Dockerfile and prefix any relative destination for file copies in %files with that. Singularity's build definitions are purposefully different than docker, so any translation tool will need to make the adjustments necessary for parity. Singularity is faithfully obeying the recipe that spython has produced, according to Singularity's own syntax, so it's not really a bug in Singularity.

If you'd like to request otherwise, please open a new feature request issue with full detail of the functionality that you'd like to see exposed, and examples of how it would be used.

and the container environment should be optionally inherited prior to running the post section.

The environment of a base container (the bootstrap/from image) will now be set in %post for recent versions of Singularity.

There has been a lot of prior debate about sourcing the definition file's additional runtime %environment during %post. Both sides of the debate have good arguments. Ultimately we have decided not to do that - we retain the current behavior for consistency, and this is unlikely to change. Adding an option to source the %environment in %post would cause confusion as builds would not work correctly on different versions of Singularity without that option, so it is unlikely to be considered while there is a simple workaround.

Singularity Python does it's best to map between the two, but it's not a promise of perfection, only a means to get started. For WORKDIR we already try to capture it as much as we can, without singularity supporting WORKDIR. Both to account for the runscript being in a working directory and for changes during build https://github.com/singularityhub/singularity-cli/blob/master/spython/main/parse/parsers/docker.py#L418.

This resulted in a mess. It was not copying files to /workspace and the environment was all messed up. I fixed it up like this:

Then you are welcome to issue a pull request to change it, or just write the file yourself.

Alright, well at least now it's documented in this issue for posterity.

The environment of a base container (the bootstrap/from image) will now be set in %post for recent versions of Singularity.

If that's true that's great. Thanks!

Was this page helpful?
0 / 5 - 0 ratings