Che: Most public images used in devfile registry do not work as the base for Che 7 workspaces on OpenShift

Created on 31 May 2019  路  15Comments  路  Source: eclipse/che

Description

OpenShift by default starts containers using a random UID for security purposes [1]. Since the UID to be used is not known at container creation time, there is no entry for the running user in /etc/passwd.

This can cause a multitude of problems:

  • The terminal is buggy: tab completion does not work, pressing up arrow does not work

    • Default terminal is always /bin/sh; starting bash results in the prompt

      I have no name!@workspaceid $

    • User does not have a home directory, and terminal does not open to /projects

  • Any program that depends on user id or home directory can have issues

    • Maven requires specifying a path to store the .m2 repo (see https://github.com/eclipse/che/issues/13451 and https://github.com/eclipse/che/pull/13453)

    • Gradle has a similar requirement

    • General errors can be hard to predict, e.g. there are some python modules that will fail similarly, and the number of tools that could fail is not something we can manage from a support perspective.

  • Some configuration that relies on having a home directory is impossible, and behaviour in these cases is generally unpredictable.

The suggestions in [1] are

  • Any directory that needs to be accessed must have read/write permissions for the root group. This is not a problem for directories mounted as volumes
  • The /etc/passwd file should be writable by the root group. Then, an entrypoint script can execute something like
    bash if ! whoami &> /dev/null; then if [ -w /etc/passwd ]; then echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd fi fi
    to set the current user correctly.

The second point above is the issue -- images not created to run on OpenShift do not have a writeable /etc/passwd; further, since we normally overwrite container commands with sleep infinity or tail -f /dev/null, even images with an entrypoint that does what we need won't execute the script by default. Note that this is what's done when starting the default "Che 7" stack, which explains why it works.

At this time, I don't see many good potential solutions; when running on OpenShift, the Che server could attempt to rewrite the recipe's container commands with the script above followed by whichever non-terminating command we like, but this would still require using compatible images in the devfile registry. It could also result in confusing errors if we're automatically rewriting entrypoints, and it's difficult to specify a script like above in the yaml list format used for container commands.

I'm currently working with trying to implement the above, but if anyone has better ideas, it'd be great to hear them.

[1]- https://docs.okd.io/3.11/creating_images/guidelines.html#openshift-specific-guidelines

Reproduction Steps

Start a Che 7 workspace other than the default dev image (e.g. java-maven), open a terminal, and type whoami

OS and version:
Che 7 on any OpenShift

Diagnostics:
In most images, we have e.g.:

$ id
uid=1075080000 gid=0(root) groups=0(root),1075080000
kinbug severitblocker statucode-review statuin-progress teahosted-che

All 15 comments

cc: @l0rd I'd like your thoughts since you're familiar with these issues.

@amisevsk great analysis! And that's super important for the UX hence I am happy that you have dug the problem. Here are my comments:

We should use openshift-compatible images only (group root can write /etc/passwd)
In the last month or so we have started trying to move to openshift compatible images only, even upstream (c.f. my comment here). This has ended looking at stacks in https://github.com/sclorg but it doesn't look that these will satisfy all our use cases. The other promising image catalog is the UBI images one. Anyway the images we use upstream should all be OpenShift compatible: that's the primary requirement. Other requirements are that they should be community supported and actively maintained. There may be cases where we don't find an image that satisfy all those requirements. In this case we can still open a PR to make the community supported image compatible with OpenShift or, if we really have no alternative, build it an maintain it ourselves in eclipse/che-dockerfiles. But ideally I would like eclipse/che-dockerfiles to become empty :smiley:

Overriding the entrypoint to set patch /etc/passwd
I think it's great if you we can add that in the wsmaster logic. But that should be something optional. In other words we should be able to specify in the devfile if we are going to ovewrite /etc/passwd or not. Like having a new arbitraryUserSupport attributes for dockerimage and kubernetes components:

  - type: dockerimage
    image: centos:centos7
    command: ["fail"]
    args: ["-f", "/dev/null"]
    arbitraryUserSupport: true
    memoryLimit: 50Mi

Maybe I'm naive here, but to me, this sounds like a problem in OpenShift. The problem is not that they run the container as a random user, but that this user does not conform to the expectations unix programs make of a user.

In an ideal world, OpenShift would fix this. Do we have an issue open with them and is there a response?

@tsmaeder the thing is that these problems are fixed on S2I images like openjdk-8-rhel8 [1]. But those are located in the private registry.redhat.io

image

[1] https://access.redhat.com/containers/?tab=overview&get-method=unauthenticated#/registry.access.redhat.com/openjdk/openjdk-8-rhel8

Initial PR with patched Dockerfile has been merged to devfile registry https://github.com/eclipse/che-devfile-registry/pull/24

PR patching all currently used images (except for one based off alpine): https://github.com/eclipse/che-devfile-registry/pull/38

@amisevsk is there some issue with patching 'alpine' based community images or it is just not yet done?

@ibuziuk There is, but sadly the discussion is split over 4-5 PRs/issues.

The alpine image does not ship with bash installed, so the current patch breaks the image (sets default login shell for user to /bin/bash. The three options for fixing this are

  • Treat the angular devfile as a special case in patching (use /bin/sh instead of bash)
  • Switch the angular devfile to use the debian-based variant instead of alpine. Works but the image ends up significantly larger

    • This may not actually be an issue since most debian layers should be cached from other images

  • Create a new base image that installs bash on top of the default alpine-based image and use that as our base image in patching.

I think the second option (move to debian) is cleanest, but I haven't tested or checked if there are further issues. It shouldn't cause problems since we already use debian-based devfiles elsewhere.

@l0rd are you, in general, ok to switch the alpine images to debian in the devfile registry for Che 7 GA and do the arbitary user patch the same way as it is currently done for all the other images?

@ibuziuk, speaking of split discussion: https://github.com/eclipse/che-devfile-registry/pull/38#issuecomment-513729389

I'll open a PR to use a more convenient base image.

Yes we should not use images that don't have bash

@amisevsk @l0rd I believe all the images required for GA have been patched already. Can we close this issue now ?

As a blocker for 7.0.0 I believe this issue can be closed, but in opening it I initially intended it to be solved by a more general solution (public images still do not work, and users will still run into problems trying to create their own devfiles -- we just have a workaround that fixes our use-case).

@amisevsk I reworded this issue a bit to make it Che 7 specific. WDYT about closing this one and creating a separate one for the general use-case of public images as a language runtime on OpenShift?
I believe we need to cross-link this new general issue with https://github.com/openshift/origin/issues/23369

a separate issue for general use of language runtime images without patching on OpenShift v4 has been created - https://github.com/eclipse/che/issues/14042

Was this page helpful?
0 / 5 - 0 ratings