/kind bug
Description
When running a container with explicitly set user, such as
https://github.com/schemaspy/schemaspy/blob/a28c9fc932cc6f85c7780050a678b3a3d7f595e9/Dockerfile#L44 the volume mounted by podman is not writeable.
Steps to reproduce the issue:
Get some PostreSQL host (192.168.4.1) and port (5432)
Create dir to mount as a volume
$ mkdir html
podman$ podman run -it -v "$PWD"/html:/output:Z schemaspy/schemaspy:snapshot -u postgres -t pgsql11 -host 192.168.4.1 -port 5432 -db anitya
...
INFO - Starting Main v6.1.0-SNAPSHOT on 9090a61652af with PID 1 (/schemaspy-6.1.0-SNAPSHOT.jar started by java in /)
INFO - The following profiles are active: default
INFO - Started Main in 2.594 seconds (JVM running for 3.598)
INFO - Starting schema analysis
ERROR - IOException
Unable to create directory /output/tables
INFO - StackTraces have been omitted, use `-debug` when executing SchemaSpy to see them
Describe the results you received:
Container is unable to write to /output, most likely because it is running with java user.
Describe the results you expected:
Volumes work rw regardless of user settings inside of conainer.
Output of podman version:
✗ podman version
Version: 1.5.1
RemoteAPI Version: 1
Go Version: go1.12.7
OS/Arch: linux/amd64
I assume the -u postgres in here means that your app in the container isn't running as root?
As such, it's running as a non-0 user in the container, which is mapped to a user on the host through /etc/subuid (root in a rootless container is the user that started the container, all higher UIDs and GIDs are mapped to a block on the host given by /etc/subuid and /etc/subgid). The volume looks like it's somewhere that's owned by the user starting the container - but you're running the app in the container as a different user, which means you run into permissions errors.
You may want to just remove -u postgres and run as root if you need to access volumes owned by your user. Running as root in a rootless container is already very secure (the container has no added privileges that your user does not), so the only security benefit to swapping to another user in the container is preventing the container from accessing files owned by your user - which, in this case, you need (to talk to the volume).
@mheon -u is a parameter for schemaspy, the container itself defines USER java, and podman is run unprivileged without sudo. I'd expect podman to handle mount volumes transparently without leaking low level details about uid and filesystem mappings. Otherwise all scripts will need to contain -u root, which doesn't looks very secure. )
We really can't handle this ourselves - these are separate users from the perspective of the kernel, and normal filesystem permissions apply.
It is possible to configure filesytem layer to ignore host permissions? If a container is already isolated through filesystem path, why impose additional uid restrictions?
Maybe it is possible to implement two layer writes? The first layer enforces permissions, so that container won't escape the defined path, but final writes to disk are ignoring the permissions. If container needs a separate user with volume mapping, maybe podman could switch to the double layer concept automatically.
You can change the ownership on the volume with the podman unshare chown UID PATH
@rhatdan PATH is path in my directory, not /var/..., right? How do I know UID? What will happen to filesystem permissions after I quit this modified user namespace?
I also checked man podman-unshare and the description sounds too low level. Maybe it is possible to modify it for people who are not familiar with cgroups yet.
podman-unshare - Run a command inside of a modified user namespace.
I've got a thought overnight.
(root in a rootless container is the user that started the container, all higher UIDs and GIDs are mapped to a block on the host given by /etc/subuid and /etc/subgid)
When I run container with custom USER as non-privileged, then there is no root inside anymore - is that right? If it is so, then why not to map that custom USER instead of root to my UIDs and GIDs instead?
By default podman as non root, runs as root within the container. This means the processes in the container have full Namespaced Capabilities. This also means that if the container process escaped the container, it would have full access to files in your homedir (Based on UID, SELinux would still block it, but I have heard that some people disable SELinux). If you run the processes within the container as a different non root UID, then those processes will run as that UID and if they escape they would only have world access to content in your homedir.
@abitrolly I am writing a blog based on these issues. Send me your email and I can expose an early copy to you. [email protected]
Just needs to be reviewed and then I can get it published.
@rhatdan people disable SELinux, because not all builds scripts add :Z suffix to volume mounts, without which volumes on SELinux do not work, and podman doesn't add this suffix automatically.
My email is [email protected]. I sent email titled "Early copy" from this address.
Given that even with podman unprivileged model escaped container can steal my private SSH keys, I don't think that podman is more secure than docker anymore. Private keys are more valuable than root level access to OS (which main risk is again - stealing private keys from more boxes). Now I think that double level filesystem access control is a must have feature for any non-priviliged process containers.
People running with containers not separated by SELinux are taking a big risk, since it is the main tool to protect their file system from containers.
Escape for Docker allows access to all keys, rootless podman only to the users uid.
Running rootless containers in a different User Namespace would give you more protections.
@abitrolly But bottom line, I tell people to always run their containers as non-root, even in the rootless container.
One thing we could consider would be to add a :U to volumes which would chown the directory to match the primary user of the container. For podman
Might be something to consider.
Not sure if this is a "true" solution or more of a workaround, but would not --userns handle at least some of the situations desired to mount with non-root user permissions?
For example:
podman run --rm --userns=keepid -v /home/hostUserName/tmp:/home/containerUserName/tmp:Z -it image_name /bin/bash
This mounts tmp inside the container at /home/containerUserName/tmp with the same UID:GID inside the container as it possesses on the host.
Perhaps --userns=ns:my_namespace could be used to mount a volume with the UID:GID corresponding to the user named my_namespace?
Note: you cannot use --user myUserName and --userns=... in the same podman run .... command, as I understand it.
This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.
One thing we could consider would be to add a :U to volumes which would chown the directory to match the primary user of the container.
@rhatdan, what's your take on this issue. Shall we pursue your upper proposal?
I don't think so, I am hesitant to make this more complicated. I think it is up to the user to set up the permissions correctly on the volume.
I still don't understand what unshare does. How is that different from su <user>? How does unshare know the UID to run inside?
What is the proposed solution? Is the following correct?
podman unshare chown -R RLUID /host/pathpodman run -v /host/path:/guest/path - /guest/path is now writablechown -R UID to get permissions backIs that right?
Wow, this stuff is way too complicated. I've the same issue as @abitrolly (running podman as non-root, having a user inside the container that is not "root" and I cannot write to the mounted directory). I've read every comment here and I still don't have an idea how to make this work.
So, it seems like I can make it "work" be "chown"ing (on the host) the shared directory to the user-id that the non-root-container-user (in my case called "jenkins", because I'm using the jenkins:jenkins image from Docker hub) is mapped to on the host system. In my case, this jenkins-user from inside the container has the UID 559751 on the host system. (Btw, what is the easiest way to find this out?). So, doing sudo chown 559751 builds on the host makes the directory writable to the user inside the container. But this has two big issues:
I need to share this image with my coworkers (who don't have root privileges), so both of these points are unacceptable.
I don't think so, I am hesitant to make this more complicated. I think it is up to the user to set up the permissions correctly on the volume.
Okay, but how?
Root should not be necessary - podman unshare sticks you into the same user namespace that the rootless container uses, which gives you access to every UID/GID that the container does. Within a podman unshare shell you should be able to chown folders/files owned by your user to the UID/GID used by Jenkins. You will need to know what IDs are in use inside the container, because podman unshare is a shell on the host (though you can mount the container with podman mount and inspect its /etc/passwd to get those). This can also potentially allow you to identify the user we're mapped to on the host (su to the right UID in the podman unshare shell and touch a file in your /home - the UID there should be the one in use).
For the second issue... That is definitely a concern, and one I don't think we have an easy solution to as yes. There is talk of adding UID/GID mappings to LDAP for use across multiple systems, but they will still be unique to the user running the container for security reasons, so not portable between users
Running rootless containers as non-root and mounting in volumes is proving to be quite complicated. I think a review of how things are right now and a discussion of how we can improve (maybe a blog?) is definitely warranted here.
@mheon Thank you very much for your detailed comment!
Root should not be necessary
I definitely needed "sudo" to execute sudo chown 559751 builds on the host. This may be because the user accounts are centrally managed and there may be something wrong with my /etc/subuid..? I remember that it was necessary for me to create this file manually some months ago. But this may be because my workstation is old and my Fedora installation has been upgraded many times over the last six years or so.
. Within a
podman unshareshell you should be able to chown folders/files owned by your user to the UID/GID used by Jenkins.
Well, this seems _portable_ in the sense that I should be able to write a simple shellscript to automate this process for my coworkers every time they want to use my image.
unshare seems to be a strange name for this subcommand, but I probably just do not understand the deeper meaning of this. When browsing the podman-subcommands in an attempt to fix my issue I would've disregarded this subcommand immediately just because of its name. ("How does _unsharing_ something helps me with those permission issues?")
I will tinker around with this some more tomorrow.
I agree that unshare is a terrible name; we named it after an existing utility that enters user namespaces (doing something very similar to what we do, but not doing many of the things we do to make sure that it matches what other podman commands are doing.
@ChristianCiach have you been able to come up with tutorial for your colleagues?
@abitrolly Sorry for the late reply. Yes, creating a simple wrapper that calls "podman unshare" before calling "podman run" works as expected. This is good enough for my use case.
Now how do you chown the directory back to the host user though?
$ mkdir tmp
$ podman unshare chown 1001:1001 tmp
$ ls -la tmp
total 0
drwxrwxr-x. 2 101000 101000 40 Jul 30 17:20 ./
drwxrwxrwt. 54 root root 2000 Jul 30 17:20 ../
/tmp
$ chown $(id -u):$(id -g) tmp
chown: changing ownership of 'tmp': Operation not permitted
It might not work in all use cases but another work around is to run the command in the container with the host's user ID and GUID by using --userns=keep-id --user=$(id -ur):$(id -gr), e.g.:
$ mkdir project
$ podman run -it --rm -v $PWD/project:/project:z --userns=keep-id --user=$(id -ur):$(id -gr) --entrypoint=/bin/bash quay.io/quarkus/ubi-quarkus-mandrel:20.1.0.1.Alpha2-java11 -c 'id; touch /project/lala'
uid=1000(1000) gid=1000 groups=1000
while without it it fails:
$ mkdir project
$ podman run -it --rm -v $PWD/project:/project:z --entrypoint=/bin/bash quay.io/quarkus/ubi-quarkus-mandrel:20.1.0.1.Alpha2-java11 -c 'id; touch /project/lala'
uid=1001(quarkus) gid=1001(quarkus) groups=1001(quarkus)
touch: cannot touch '/project/lala': Permission denied
I wonder if giving the container host user ID and GUID makes the contaner unprivileged?
Containers by default are unprivileged. (Depending on your definition of unprivileged)
Running with --keep-id just changes the way the User Namespace is setup, It does not change the security controls on the container.
The only difference is instead of the users UID being Root inside of the container, the User UID is the Users UID inside of the container, and the first UID listed for the user in the /etc/subuid files user mappings is UID=0 inside of the container.
i'm battling this same thing. i am using a bitnami image of postgresql from docker hub. it has a baked in user id of 1001. on my arch linux system my uid is 1000. I would like to make a directory in my home directory for postgres to persist its data to, and be able to poke around in that directory without having to chown it all the time when i want to.
@rhatdan what is your suggestion for people using vendor provided images that already have a uid baked in?
Well for now you can do
$ podman unshare chown 1001:1001 PATHTODIR
We could add something to the volume command to do this, but I am not sure how ugly the syntax would be.
Well for now you can do
$ podman unshare chown 1001:1001 PATHTODIR
We could add something to the volume command to do this, but I am not sure how ugly the syntax would be.
I would love to have this feature, even if it looks ugly :)
Thanks for writing this up, it really helped me to understand what is going on here.
Unfortunately the container I am using and want to deploy and regularly update has 43 different user, and their associated group relationships. So If I understand the situation I would need to parse out all 43 entries from /etc/passwd using podman mount, then create a wrapper script that calls podman unshare with each of those. Then when the container gets updated to add a new user, I'm then broken and need to go update the script.
I know it would be complex from an implementation perspective, but it would be great if podman could inspect /etc/passwd itself from within the image and pull out the appropriate non-root users all within the --volume command option without a need for further user options.
Most helpful comment
I agree that
unshareis a terrible name; we named it after an existing utility that enters user namespaces (doing something very similar to what we do, but not doing many of the things we do to make sure that it matches what otherpodmancommands are doing.