I'm trying to run rootless podman in an OpenShift pod (so we can do scalable analysis of container images; current implementation uses atomic --storage ostree and skopeo).
The thing is that with OpenShift's anyuid feature it's not easy to predict the uid. nss_wrapper is being used as a solution to this problem.
Minimal reproducer:
#!/bin/bash
set -ex
CONT=cont
IMAGE=rootless-podman
podman stop $CONT || :
podman rm -f $CONT || :
podman pull registry.fedoraproject.org/fedora:29
podman run --net host -d --name $CONT registry.fedoraproject.org/fedora:29 sleep 99999
podman exec $CONT dnf install -y podman
podman exec $CONT useradd podm
podman exec $CONT bash -c 'printf "podm:1000:9999" >/etc/subuid && cp /etc/subuid /etc/subgid'
podman stop $CONT
podman commit $CONT $IMAGE
podman rm $CONT
# this is how openshift creates the default pod
# https://github.com/containers/libpod/issues/1092#issuecomment-437962358
# --net host for sake of not having networking issues; sometimes dnf can't sync a repo
podman run --net host --cap-drop=setuid --cap-drop=setgid --cap-drop=kill --cap-drop=mknod --name=$CONT -u podm $IMAGE podman --log-level=debug --storage-driver=vfs pull fedora:29
So I did this:
$ top -n 2 ./passwd
root:x:0:0:root:/root:/bin/bash
user:x:1000070000:1000070000:user:/src:/bin/bash
export LD_PRELOAD=libnss_wrapper.so
export NSS_WRAPPER_PASSWD="${PWD}/passwd"
export NSS_WRAPPER_GROUP=/etc/group
and then created
$ cat /etc/sub{uid,gid}
user:231072:65536
user:231072:65536
I'm still getting the same error message from podman:
bash-4.4$ id -u
1000070000
bash-4.4$ /src/podman --root $PWD/r/ pull docker.io/library/busybox
...
ERRO[0005] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: lchown /home: invalid argument
Failed
bash-4.4$ pwd
/var/tmp
bash-4.4$ ls -lha
total 16K
drwxrwxrwt. 1 root root 27 Jul 13 14:59 .
drwxr-xr-x. 1 root root 16 Apr 26 13:03 ..
-rw-r--r--. 1 user root 829 Jul 13 14:54 passwd
drwxr-xr-x. 9 user root 4.0K Jul 13 14:51 r
the busybox image has files owned by different users (not only root), so containers/storage tries to chown them with some uids/gids != 0.
I believe that podman has still only one uid available inside the user namespace, and after https://github.com/projectatomic/libpod/pull/1097 it will just fail to start.
Okay, I'll try again with the patch and with a different image.
Patch has been merged.
Sadly it still doesn't work. It seems that podman doesn't respect the nss_wrapper setup:
bash-4.4$ id
uid=1000070000(podm) gid=0(root) groups=0(root),1000070000
bash-4.4$ getent passwd podm
podm:x:1000070000:0:podm:/src:/bin/bash
bash-4.4$ podman
ERRO[0000] No subuid ranges found for user ""
bash-4.4$ cat /etc/subuid
podm:231072:65536
bash-4.4$ cat /etc/subgid
podm:231072:65536
I prepared a repo with my complete setup: https://github.com/TomasTomecek/rootless-podman-in-openshift
is the env variable $USER set correctly?
Does user.Lookupid(UIDString) do the right thing in the golang libraries?
Well spotted, Giuseppe!
I am pass that error now, but sadly I'm back to invalid argument:
$ podman --log-level=debug pull busybox
DEBU[0004] ... will first try using the original manifest unmodified
Copying blob sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185
DEBU[0004] Downloading /v2/library/busybox/blobs/sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185
DEBU[0004] GET https://registry-1.docker.io/v2/library/busybox/blobs/sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185
DEBU[0005] Detected compression format gzip
0 B / 716.06 KB [-------------------------------------------------------------]DEBU[0005] Using original blob without modification
716.06 KB / 716.06 KB [====================================================] 0s
Copying config sha256:e1ddd7948a1c31709a23cc5b7dfe96e55fc364f90e1cebcde0773a1b5a30dcda
DEBU[0005] No compression detected
0 B / 1.46 KB [---------------------------------------------------------------]DEBU[0005] Using original blob without modification
1.46 KB / 1.46 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
DEBU[0005] Start untar layer
ERRO[0005] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: lchown /home: invalid argument
Failed
Is it possible that process of openshift pod doesn't have the necessary capabilities to perform the operation?
packages
podman-0.8.1-1.git6b4ab2a.fc28.x86_64
conmon-1.10.3-1.gite558bd5.fc28.x86_64
strace
[pid 6120] lstat("/", {st_mode=S_IFDIR|0755, st_size=76, ...}) = 0
[pid 6120] lstat("/home", 0xc420405f18) = -1 ENOENT (No such file or directory)
[pid 6120] lstat("/home", 0xc420408038) = -1 ENOENT (No such file or directory)
[pid 6120] mkdirat(AT_FDCWD, "/home", 0755) = 0
[pid 6120] lchown("/home", 65534, 65534) = -1 EINVAL (Invalid argument)
[pid 6120] unlinkat(AT_FDCWD, "/temp-storage-extract719224347", 0) = -1 EISDIR (Is a directory)
[pid 6120] unlinkat(AT_FDCWD, "/temp-storage-extract719224347", AT_REMOVEDIR) = 0
[pid 6120] write(2, "lchown /home: invalid argument", 30) = 30
[pid 6120] exit_group(1 <unfinished ...>
[podm@podman ~]$ ls -lha
total 16K
drwxrwxrwx. 3 1000 podm 90 Aug 4 14:32 .
drwxrwxrwx. 3 root root 18 Aug 4 14:30 ..
-rwxrwxrwx. 1 1000 podm 18 Jun 18 08:28 .bash_logout
-rwxrwxrwx. 1 1000 podm 193 Jun 18 08:28 .bash_profile
-rwxrwxrwx. 1 1000 podm 231 Jun 18 08:28 .bashrc
drwx------. 3 podm root 19 Aug 4 14:32 .local
-rwxrwxrwx. 1 root root 89 Aug 4 14:31 passwd
[podm@podman ~]$ pwd
/home/podm
[podm@podman ~]$ getent passwd podm
podm:x:1000070000:0:Rootless Podman:/home/podm:/bin/bash
[podm@podman ~]$ cat /etc/subuid
podm:231072:65536
[podm@podman ~]$ cat /etc/subgid
podm:231072:65536
@TomasTomecek We need to handle the error where USER is not set, we need to to a lookup. Lets not loose that part of the issue.
We need to handle the error where USER is not set, we need to to a lookup. Lets not loose that part of the issue.
I've opened a PR here: https://github.com/projectatomic/libpod/pull/1217
I tried this with latest releases:
podman-0.8.5-1.gitdc5a711.fc28.x86_64
origin 3.10
kernel-4.18.5-300.fc29.x86_64
and am still stuck on the lchown /home: invalid argument
and am still stuck on the lchown /home: invalid argument
Getting the same error, when trying to use podman-in-podman (both root-less).
host:
image/container:
we may be hitting this issue: https://github.com/genuinetools/img/issues/170
Giuseppe, I can see that both patches of yours for shadow were merged. When can we expect a build of podman built with them to try this out?
they are not going to be distributed with podman. They are distributed as part of shadow-utils, @t8m might know better.
For a test though, you could probably build a new image with shadow-utils built from upstream
If the solution for the issue is to ship the newuidmap and newgidmap utilities with CAP_SETUID fscap instead of them being simply setuid root, I can prepare the shadow-utils update for Fedora 29 soon.
@t8m awesome, that would be super helpful; I'll update my laptop to F29 in the meantime
If the solution for the issue is to ship the newuidmap and newgidmap utilities with CAP_SETUID fscap instead of them being simply setuid root, I can prepare the shadow-utils update for Fedora 29 soon.
no that is not needed. There is also a fix that will enable the feature when installed as a suid binary without changing to file caps
@giuseppe But I prefer the fscaps solution to the problem as it does not require changing the code and actually gives the binaries only the capability that it needs.
Hm, so how can we test the proposed solution?
@TomasTomecek can you please open a Fed芯ra bug against shadow-utils?
for sake of linking things together, here's the bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1647769 we already spoke together over IRC
Edit: I'm going to update my reproducer to be minimal and doesn't require full openshift: https://github.com/TomasTomecek/rootless-podman-in-openshift
Okay, I won't be able to come up with a minimal reproducer: openshift is very strict in this space and the minimal reproducer would not correspond to the openshift environment. Anyway, I rerun my testing and here are my findings:
[podm@podman-in-okd-20181108t165009822581 /]$ podman --storage-driver=vfs pull fedora:29
Trying to pull docker.io/fedora:29...Getting image source signatures
Copying blob sha256:d0483bd5a55488f5ba6383a5cc8553d5101864f03acd07eabc5df7563c3692cf
83.23 MB / 83.23 MB [=====================================================] 34s
Copying config sha256:8c568f1043264e34f0a8774587266565c7e5e54e9ea6b97ab459086d18ac5175
2.29 KB / 2.29 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
ERRO[0040] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: lchown /run/systemd/netif: invalid argument
Still the same error, just different directory. @t8m: Tome, once there is a build of shadow-utils with the patches from Giuseppe, please let me know and I'll try this again.
@TomasTomecek Are you sure that the fixed shadow-utils are in place? What is the kernel of the host?
@t8m this is F29, so 4.18. I'm 100% sure that updated shadow-utils are not there (since you haven't touched that bug I presumed there is still no build in koji: I can pick up the latest build from koji and try again).
Ah, ten please update shadow-utils from updates-testing. shadow-utils-4.6-4.fc29
Sadly, still the same error.
[podm@podman-in-okd-20181112t130610199320 /]$ podman --storage-driver=vfs pull fedora:29
Trying to pull docker.io/fedora:29...Getting image source signatures
Copying blob sha256:d0483bd5a55488f5ba6383a5cc8553d5101864f03acd07eabc5df7563c3692cf
83.23 MB / 83.23 MB [======================================================] 3s
Copying config sha256:8c568f1043264e34f0a8774587266565c7e5e54e9ea6b97ab459086d18ac5175
2.29 KB / 2.29 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
ERRO[0007] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: lchown /run/systemd/netif: invalid argument
Failed
[podm@podman-in-okd-20181112t130610199320 /]$ ls -lha /run/systemd/netif
total 0
drwxr-xr-x. 4 root root 33 Sep 6 03:00 .
drwxr-xr-x. 9 root root 113 Sep 6 03:00 ..
drwxr-xr-x. 2 root root 6 Sep 6 03:00 leases
drwxr-xr-x. 2 root root 6 Sep 6 03:00 links
[podm@podman-in-okd-20181112t130610199320 /]$ ls -lha /run/systemd/
total 0
drwxr-xr-x. 9 root root 113 Sep 6 03:00 .
drwxr-xr-x. 1 root root 48 Nov 8 15:32 ..
drwxr-xr-x. 2 root root 6 Sep 6 03:00 ask-password
drwxr-xr-x. 2 root root 6 Sep 6 03:00 machines
drwxr-xr-x. 4 root root 33 Sep 6 03:00 netif
drwxr-xr-x. 2 root root 6 Sep 6 03:00 seats
drwxr-xr-x. 2 root root 6 Sep 6 03:00 sessions
drwxr-xr-x. 2 root root 6 Sep 6 03:00 shutdown
drwxr-xr-x. 2 root root 6 Sep 6 03:00 users
I am using the newest podman and shadow-utils:
shadow-utils-4.6-4.fc29.x86_64
podman-0.11.1-1.gita4adfe5.fc29.x86_64
[podm@podman-in-okd-20181112t141546420819 /]$ podman info [2/521]
host:
BuildahVersion: 1.5-dev
Conmon:
package: podman-0.11.1-1.gita4adfe5.fc29.x86_64
path: /usr/libexec/podman/conmon
version: 'conmon version 1.12.0-dev, commit: 8967a1d691ed44896b81ad48c863033f23c65eb0-dirty'
Distribution:
distribution: fedora
version: "29"
MemFree: 201998336
MemTotal: 16695848960
OCIRuntime:
package: runc-1.0.0-57.dev.git9e5aa74.fc29.x86_64
path: /usr/bin/runc
version: |-
runc version 1.0.0-rc5+dev
commit: ff195010cbfd3c62a98a3fd2f7a1e1594afdda1a
spec: 1.0.1-dev
SwapFree: 17172000768
SwapTotal: 17179865088
arch: amd64
cpus: 8
hostname: podman-in-okd-20181112t141546420819
kernel: 4.18.17-300.fc29.x86_64
os: linux
uptime: 40m 54.03s
insecure registries:
registries: []
registries:
registries:
- docker.io
- registry.fedoraproject.org
- quay.io
- registry.access.redhat.com
- registry.centos.org
store:
ContainerStore:
number: 0
GraphDriverName: vfs
GraphOptions: []
GraphRoot: /home/podm/.local/share/containers/storage
GraphStatus: {}
ImageStore:
number: 0
RunRoot: /tmp/user/1000140000
how do your /etc/subuid and /etc/subgid files look like?
Hmmm, it seems the ranges are not right (not sure if this has changed in openshift recently):
[podm@podman-in-okd-20181112t160941084723 /]$ cat /etc/subuid /etc/subgid
podm:231072:65536
podm:231072:65536
[podm@podman-in-okd-20181112t160941084723 /]$ id
uid=1000140000(podm) gid=0(root) groups=0(root),1000140000
Update: never mind, still the same, I updated the limits to:
[podm@podman-in-okd-20181112t162046970895 /]$ cat /etc/subuid
podm:1000000000:999999
[podm@podman-in-okd-20181112t162046970895 /]$ id
uid=1000140000(podm) gid=0(root) groups=0(root),1000140000
My understanding of namespaces + capabilities might be wrong. But can /usr/bin/newgidmapmp work if cap_setgid,cap_setuid are not available in unprivileged pod?
sh$ oc rsh pod/jenkins-slave
sh-4.4$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
uid=1000120000(???)
gid=0(root)
groups=1000120000(???)
Then it means anybody would be able to "add" new capability there. Or is it limited just for cap_setgid,cap_setuid?
cap_setgid,cap_setuid must be available in the pod, otherwise newuidmap/newgidmap won't work. The difference with the last version of shadow-utils is that only these caps are needed, instead before also cap_sys_admin was required for root
I just check changes in spec file of shadow-utils :-) And previously there was suid bit instead of capabilities; therefore I did not notice cap_sys_admin. And I could not see that in chagelog either
* Tue Nov 06 2018 Tom谩拧 Mr谩z <[email protected]> - 2:4.6-4
- use cap_setxid file capabilities for newxidmap instead of making them setuid
- limit the SYS_U/GID_MIN value to 1 as the algorithm does not work with 0
and the 0 is always used by root anyway
- manual page improvements
So the other question. Are cap_setgid,cap_setuid available in unprivileged openshift/k8s pod by default?
Or just my testing was not accurate.
I checked restricted secure context and it seems that SETUID,SETGID are dropped by default.
sh# oc describe scc/restricted
Name: restricted
Priority: <none>
Access:
Users: <none>
Groups: system:authenticated
Settings:
Allow Privileged: false
Default Add Capabilities: <none>
Required Drop Capabilities: KILL,MKNOD,SETUID,SETGID
Allowed Capabilities: <none>
Allowed Seccomp Profiles: <none>
Allowed Volume Types: configMap,downwardAPI,emptyDir,persistentVolumeClaim,projected,secret
Allowed Flexvolumes: <all>
Allow Host Network: false
Allow Host Ports: false
Allow Host PID: false
Allow Host IPC: false
Read Only Root Filesystem: false
Run As User Strategy: MustRunAsRange
UID: <none>
UID Range Min: <none>
UID Range Max: <none>
SELinux Context Strategy: MustRunAs
User: <none>
Role: <none>
Type: <none>
Level: <none>
FSGroup Strategy: MustRunAs
Ranges: <none>
Supplemental Groups Strategy: RunAsAny
Ranges: <none>
Or should we check different scc?
@TomasTomecek missing caps (SETUID,SETGID) in bounding set is the main issue.
It is visible in following two snippets. 1st one is pod with scc/restricted and 2nd one is scc/restricted + caps(SETUID,SETGID)
sh# oc exec -ti podman-in-okd-20181112t170806325043 -- bash
[podm@podman-in-okd-20181112t170806325043 /]$ capsh --print | grep -o "cap_set.id"
[podm@podman-in-okd-20181112t170806325043 /]$ podman --storage-driver=vfs pull fedora:29
Trying to pull docker.io/fedora:29...Getting image source signatures
Copying blob sha256:d0483bd5a55488f5ba6383a5cc8553d5101864f03acd07eabc5df7563c3692cf
83.23 MB / 83.23 MB [======================================================] 4s
Copying config sha256:8c568f1043264e34f0a8774587266565c7e5e54e9ea6b97ab459086d18ac5175
2.29 KB / 2.29 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
ERRO[0005] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: lchown /run/systemd/netif: invalid argument
Failed
Trying to pull registry.fedoraproject.org/fedora:29...Getting image source signatures
Copying blob sha256:7692efc5f81cadc73ca1afde08b1a5ea126749fd7520537ceea1a9871329efde
96.61 MB / 96.61 MB [======================================================] 4s
Copying config sha256:a5cc8ccd230a587867762e1b38efc941ce1743d2151fe36b00c8841ff9e09b62
1.27 KB / 1.27 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
ERRO[0012] Error while applying layer: ApplyLayer exit status 1 stdout: stderr: lchown /var/spool/mail: invalid argument
Failed
Trying to pull quay.io/fedora:29...Failed
Trying to pull registry.access.redhat.com/fedora:29...Failed
Trying to pull registry.centos.org/fedora:29...Failed
error pulling image "fedora:29": unable to pull fedora:29: 5 errors occurred:
* Error committing the finished image: error adding layer with blob "sha256:d0483bd5a55488f5ba6383a5cc8553d5101864f03acd07eabc5df7563c3692cf": ApplyLayer exit status 1 stdout: stderr: lchown /run/systemd/netif: invalid argument
* Error committing the finished image: error adding layer with blob "sha256:7692efc5f81cadc73ca1afde08b1a5ea126749fd7520537ceea1a9871329efde": ApplyLayer exit status 1 stdout: stderr: lchown /var/spool/mail: invalid argument
* Error determining manifest MIME type for docker://quay.io/fedora:29: Error reading manifest 29 in quay.io/fedora: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>\n"
* Error determining manifest MIME type for docker://registry.access.redhat.com/fedora:29: Error reading manifest 29 in registry.access.redhat.com/fedora: unknown: Not Found
* Error determining manifest MIME type for docker://registry.centos.org/fedora:29: Error reading manifest 29 in registry.centos.org/fedora: manifest unknown: manifest unknown
[podm@podman-in-okd-20181112t170806325043 /]$ exit
command terminated with exit code 125
and I manged to at least pull image from registry in 2nd one
[root@xci31 ~]# oc exec -ti podman-in-okd-20181112t171049571203 -- bash
[podm@podman-in-okd-20181112t171049571203 /]$ capsh --print | grep -o "cap_set.id"
cap_setgid
cap_setuid
cap_setgid
cap_setuid
[podm@podman-in-okd-20181112t171049571203 /]$ podman --storage-driver=vfs pull fedora:29
Trying to pull docker.io/fedora:29...Getting image source signatures
Copying blob sha256:d0483bd5a55488f5ba6383a5cc8553d5101864f03acd07eabc5df7563c3692cf
83.23 MB / 83.23 MB [===================================================] 3m35s
Copying config sha256:8c568f1043264e34f0a8774587266565c7e5e54e9ea6b97ab459086d18ac5175
2.29 KB / 2.29 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
8c568f1043264e34f0a8774587266565c7e5e54e9ea6b97ab459086d18ac5175
But you cannot run container there. But it might be sufficient for colin usecase IMHO
[podm@podman-in-okd-20181112t171049571203 /]$ podman --storage-driver=vfs run fedora:29 cat /etc/os-release
container create failed: nsenter: failed to update /proc/self/oom_score_adj: Permission denied
container_linux.go:337: starting container process caused "process_linux.go:302: running exec setns process for init caused \"exit status 10\""
: internal libpod error
[podm@podman-in-okd-20181112t171049571203 /]$ echo $?
127
[podm@podman-in-okd-20181112t171049571203 /]$ podman --storage-driver=vfs ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dae8dfcbc6b4 docker.io/library/fedora:29 cat /etc/os-relea... 10 seconds ago Created dreamy_khorana
So the only limitation seems to be missing capabilities (SETUID,SETGID) in scc/restricted :-(
So the shadow-utils-4.6-4 should be fine and the issue is now elsewhere.
I have attached a minimal reproducer to the original post.
I have attached a minimal reproducer to the original post.
@TomasTomecek I see in the reproducer that you are dropping cap_setuid and cap_setgid. What we have achieved with the latest shadow-utils is the possibility to run without cap_sys_admin, but rootless containers still require cap_setuid and cap_setgid otherwise there is no way to use multiple UIDs/GIDs
@giuseppe, as @lslebodn figured out, openshift itself is dropping those capabilities. I have opened an origin issue over here: https://github.com/openshift/origin/issues/21514
this is a tricky issue, allowing cap_setuid and cap_setgid by default lowers considerably the security.
Potentially an unprivileged user could become any UID with the help of a suid application that is part of the image.
@giuseppe I think they also set no new privileges by default, which would torpedo that
I would figure you would need to have an SCC that adds these capabiltities and allows the user to run podman.
@TomasTomecek @giuseppe What should we do with this issue?
I think we can close it. There is nothing we can do without CAP_SETUID/CAP_SETGID, except have a single user mapped in the user namespace
I'm going to work in this area a bit more in coming weeks and might open more specific issues.