I installed a basic OCP 4 cluster on AWS. The default aws-ebs storage is used. I tried to install Che from the OperatorHub marketplace and the install failed because Postgres entered a CrashLoopBackOff state.
The Postgres container's logs show the following error:
johns-mbp-3:.odo johncollier$ oc logs postgres-cc6b567f-fc9hj
mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied
aws-ebs is used for default strorage (should be for new installs on AWS)
kubectl version) oc version)minikube version and kubectl version)minishift version and oc version)docker version and kubectl version)johns-mbp-3:.odo johncollier$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth
Server https://<url>:6443
kubernetes v1.14.0+573d946
From my understanding it duplicates the https://github.com/eclipse/che/issues/14331
Could you have a look pls @amisevsk
It does look like an instance of #14331 (at least, in terms of error), but I have no idea what's causing the issue. Maybe @davidfestal can help as he works on the Che operator.
I don't have much more ideas. If we would need to add some more options in some k8s resources (deployments or PVs, possibly according to what is proposed in issue #14331), we might be able to implement the changes in the operator Go code, build and push a distinct operator docker image, and override it in the installed CSV + operator deployment, to test a possible solution.
@davidfestal I can deploy the operator manually on OCP4 if you're able to get me a debug image to use.
@johnmcollier It wouldn't be possible tomorrow for me, but possibly on Monday.
@davidfestal Yeah, no worries and no rush!
Hi Guys,
Thanks for working this issue. There are other users with the same problem, so please continue to make notes here as progress is made.
(Also, can we please amend the 'no rush' to be 'there is a small rush, please work as time allows. :) )
Thanks,
Rick
bumping severity
@davidfestal I can deploy the operator manually on OCP4 if you're able to get me a debug image to use.
@johnmcollier It would be great. I would start working on it on next Monday. And we could sync as soon as you're available.
@johnmcollier it seems you installed Che through the operator in the default namespace. That might be the underlying reason of the error.
Could you try installing the Che operator in a dedicated namespace you create and check if the problem is still there ?
@johnmcollier Could you also provide the OpenShift events for the postgres deployment, if you have a chance?
@davidfestal @amisevsk Sure thing, I'll install in a non-default namespace and provide the postgres events.
Might be a little bit, I need to reinstall OCP4 first.
I reinstalled the Che operator in the che namespace and it started working!
Curious: Why does the default namespace fail but others are fine?
I tried to look into the definition of the default namespace vs. user namespaces, and didn't see anything special. But I'm not expert at all on container file-system permissions. However I'm not sure the default namespace is expected to be used by end-users.
@gorkem @l0rd Do you confirm that the default namespace is not expected to be used to install user components such a Che server ?
In this case I assume that the action items (in order of priority) could be:
chectl-based installation creates a dedicated user namespace).CheCluster custom resource is in the default namespace. We could now use the new Detailed Message and Help Link CR status fields (visible in the OperatorHub) to provide feedback to the user and possibly link to the new documentation.default namespace (However we would need to see how it would behave in OperatorHub: this step might not be worth the try due to its low added-value for the end-user who might still choose the default namespace initially)AllNamespaces install mode on the Che Operator, at least of a dedicated channel. But this has to be explored first to really measure the impacts.@davidfestal yes I think we can say that the default namespace is usually not used in prod. But if someone wants "just" to try Che he will probably use the namespace default. Hence this may be a pretty common use case.
Other comments:
default namespace so why shouldn't we?Talked with @davidfestal and we should investigate this further to better understand the root cause: how is the Postgres operator behaving? does the che server pod has the same problem?
I'm having the exact same in a fresh install inside a Ubuntu 19 KVM guest with minikube and chectl

Comparing Che installed in the che namespace to Che installed in the default namespace reveals that the securityContext gets set to {} in default, and is properly set in the che namespace:
securityContext:
seLinuxOptions:
level: 's0:c22,c9'
fsGroup: 1000480000
As I'm investigating this more, it seems like default does not respect any of the security context constraints by default that are present in openshift... If I make a new user and give them create access to pods and deployments, and they run a pod, it will run as root/the default UID present in that images Dockerfile. When I run the same pod in another namespace it is runs in a security context. default seems to have annotations regarding security contexts, but does not respect them:
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/sa.scc.mcs: s0:c6,c5
openshift.io/sa.scc.supplemental-groups: 1000040000/10000
openshift.io/sa.scc.uid-range: 1000040000/10000
creationTimestamp: "2019-10-28T20:25:55Z"
name: default
resourceVersion: "7335"
selfLink: /api/v1/namespaces/default
uid: 2515d6e5-f9c1-11e9-9124-028754979780
spec:
finalizers:
- kubernetes
status:
phase: Active
apiVersion: project.openshift.io/v1
kind: Project
metadata:
annotations:
openshift.io/sa.scc.mcs: s0:c6,c5
openshift.io/sa.scc.supplemental-groups: 1000040000/10000
openshift.io/sa.scc.uid-range: 1000040000/10000
creationTimestamp: "2019-10-28T20:25:55Z"
name: default
resourceVersion: "7335"
selfLink: /apis/project.openshift.io/v1/projects/default
uid: 2515d6e5-f9c1-11e9-9124-028754979780
spec:
finalizers:
- kubernetes
status:
phase: Active
I'm starting to think this is just a documentation issue that we will need to point out as @l0rd and @davidfestal have suggested. I reached out on the aos-devel slack channel but haven't had a reply yet.
Last update I think. There are two problems contributing to this issue:
default namespace/var/lib/pgsql: Since by default the ebs volume is only user root writeable, and there is no security context fsGroup, the command fails.
To mitigate this we could change the operator to check if we are in the default namespace and set the security context, or we can document that che should not be run in the default namespace because it won't set the appropriate security context. We could also advise people that they could use a statically-provisioned PV with appropriate permissions, and back the postgres deployment with that, but I don't know if the operator works that way.
There is a known issue for this in the documentation. Should we open another GH issue to discuss possible code fixes for this?
@tomgeorge thanks, I believe we can close the issue since it is documented case and continue the discussion in the https://github.com/eclipse/che/issues/15092
I'm unable to close this, can someone please close?
Most helpful comment
Comparing Che installed in the
chenamespace to Che installed in thedefaultnamespace reveals that thesecurityContextgets set to{}in default, and is properly set in thechenamespace: