Che: Restrict the installation of Eclipse Che only to the 'eclipse-che' namespace

Created on 18 Jun 2020 · 33Comments · Source: eclipse/che

Is your task related to a problem? Please describe.

The only namespace for Eclipse Che installation should be 'eclipse-che'. It should be prohibited to install Eclipse Che to any other namespace.

Describe the solution you'd like

When Eclipse Che is installed via Operatorhub it should be possible to install it only in the 'eclipse-che' namespace

List of subtasks:

[x] [operator] Adding 'eclipse-che' as the suggested namespace https://github.com/eclipse/che-operator/pull/328
- [x] Adding 'operatorframework.io/cluster-monitoring: true' label to the OpenShift nightly csv files https://github.com/eclipse/che-operator/pull/339
[ ] [chectl] Remove -n, --chenamespace=chenamespace option from chectl and always fall back on 'eclipse-che'? (or leave it only for the helm installer?)
[ ] [operator] Restrict the installation of Eclipse Che only to the 'eclipse-che' namespace https://github.com/eclipse/che/issues/17413
[ ] [spike] Design a solution for updating the exsiting instances of Eclipse Che and migrating them to the 'eclipse-che' namespace #17414

Describe alternatives you've considered

Continue allowing namespace selection during the installation

Additional context

Pros:

there is no particular value for namespace selection from the admin perspective.
the installation process would be more predictable
avoiding multiple instances of Che in the same cluster (to avoid conflicts and waste of resources - it will be allowed to have only 1 Eclipse Che per cluster)
avoiding the common deployment failure when it gets installed in the default namespace
To retrieve operand metrics when installed on OpenShift (will not be possible for Eclipse Che due to the openshift-* namespace restriction - https://github.com/openshift/enhancements/blob/master/enhancements/olm/olm-managed-operator-metrics.md#fulfilling-namespace-and-rbac-requirements)

Cons:

it will be allowed to have only 1 Eclipse Che per cluster - the following flow will not be possible - https://github.com/eclipse/che/issues/17187#issuecomment-657588515
we need to tackle upgrades:
- if Eclipse Che was initially installed NOT in the 'eclipse-che' namespace after the upgrade it should be in the 'eclipse-che' (previous namespace should be deleted)
- if there are multiple Eclipse Che installed on the cluster upgrades should fail with the clear error message.

areinstall kinepic kintask severitP1 statuanalyzing teahosted-che

Source

ibuziuk

All 33 comments

@sleshchenko could you please clarify how do you do the trick with the installation of the web terminal operator to the concrete namespace?

ibuziuk on 23 Jun 2020

I trust there will be a way to override (set the specific namespace) and disable (not restrict) so that we can:

set a different default for CRW vs. Che
disable telemetry for our own tests (we're not the customer, we're just testing internally)
allow installing multiple instances (again for testing since we don't have an unlimited # of clusters into which we can install ONLY one CRW instance)

WDYT?

nickboldt on 23 Jun 2020

@ibuziuk the web terminal operator is using the AllNamespace install mode. This makes it a global operator that will be installed by OperatorHub in the openshift-operators namespace by default.

cf this doc: https://docs.openshift.com/container-platform/4.4/operators/olm-adding-operators-to-cluster.html#olm-installing-from-operatorhub-using-web-console_olm-adding-operators-to-a-cluster

Mainly the following quote:

All namespaces on the cluster (default) installs the Operator in the default openshift-operators namespace to watch and be made available to all namespaces in the cluster. This option is not always available.

davidfestal on 23 Jun 2020

👍1

@ibuziuk something I was discussing this morning with @davidfestal: we may use AllNamespaces for Che as well and then deploy wsmaster in eclipse-che namespace.

@nickboldt has raised some interesting points. I don't think we should make the namespace configurable though:

set a different default for CRW vs. Che

This can be a build option but not a runtime option (i.e. users/customers should not change that).

disable telemetry for our own tests (we're not the customer, we're just testing internally)

I think the recommendation here is to receive telemetry even for our own tests. They can be useful, we need to verify that sending metrics works fine and they are easy to filter out. cc @spaparaju

allow installing multiple instances (again for testing since we don't have an unlimited # of clusters into which we can install ONLY one CRW instance)

If would discourage this: why do we want to tests scenarios that nobody will ever test? To find, analyse and fix bugs that nobody will ever find? We should not support/test multiple instances of Che on the same Kubernetes cluster. cc @rhopp

l0rd on 24 Jun 2020

👍1

If would discourage this: why do we want to tests scenarios that nobody will ever test? To find, analyse and fix bugs that nobody will ever find? We should not support/test multiple instances of Che on the same Kubernetes cluster. cc @rhopp

I would love to avoid running multiple instances on the same cluster, but as Nick said, we don't have enough resources for doing so. If this will become hard-restriction (now it's soft restriction in sense it is possible to install multiple instances on single cluster, but it's discouraged or officially not supported) it will be quite a hurdle for QE to overcome.

rhopp on 24 Jun 2020

I would love to avoid running multiple instances on the same cluster, but as Nick said, we don't have enough resources for doing so. If this will become hard-restriction (now it's soft restriction in sense it is possible to install multiple instances on single cluster, but it's discouraged or officially not supported) it will be quite a hurdle for QE to overcome.

If we want user to restrict users to one namespace and one instance of Che per cluster we should enforce this restriction when testing as well. @rhopp is this an upstream or downstream tests constraint? Is there an issue that describes the problem (i.e. in what circumstances we need to run multiple instances of Che and what options have been considered)?

l0rd on 24 Jun 2020

👍1

Sorry, the Pros aren't convincing enough.

tolusha on 7 Jul 2020

@tolusha could you clarify a clear use-case (with configuration) for having 2+ manageable instancies of Eclipse Che on the same cluster (instances for testing are not taken into account)?

ibuziuk on 7 Jul 2020

@ibuziuk
I don't have any use-cases except for testing purpose.
For this point of view it make sense to me

tolusha on 8 Jul 2020

@tolusha could you clarify a clear use-case (with configuration) for having 2+ manageable instancies of Eclipse Che on the same cluster (instances for testing are not taken into account)?

Che-server development.

skabashnyuk on 8 Jul 2020

@skabashnyuk Che-server development e.g. usage of the che-dev OSD v4 Cluster case?

ibuziuk on 8 Jul 2020

@skabashnyuk Che-server development e.g. usage of the che-dev OSD v4 Cluster case?

I'm sorry. I didn't understand your question.

skabashnyuk on 8 Jul 2020

@skabashnyuk could you clarify your point about Che-server development in regards to the fixed namespace enforcement?

ibuziuk on 8 Jul 2020

@ibuziuk you probably right. This topic is about che + operator. che-server development does not require it.

skabashnyuk on 8 Jul 2020

@ibuziuk you probably right. This topic is about che + operator. che-server development does not require it.

Right: we want to avoid that a user can select any namespace when deploying Che using the operator or chectl. But the che-server should be able to run in any namespace. It should NOT give for granted that it will run in eclipse-che namespace.

There are at least 3 reasons why we want deployments to happen in one fixed namespace:

To avoid multiple instances of Che in the same cluster (to avoid conflicts and waste of resources)
To avoid the common deployment failure when it gets installed in default namespace
To retrieve operand metrics when installed on OpenShift

@ibuziuk this is a change that needs to be communicated on different channels (che-dev, Red Hat internal mailing lists etc...) before it becomes effective.

l0rd on 8 Jul 2020

👍1

@ibuziuk this is a change that needs to be communicated on different channels (che-dev, Red Hat internal mailing lists etc...) before it becomes effective.

@l0rd got it. As the initial step we are going to define ''eclipse-che" as the suggested namespace on the operator / olm end:

This should target the second bullet (avoid the common deployment failure when it gets installed in default namespace) and make the overall UX cleaner and more predictable
PR https://github.com/eclipse/che-operator/pull/328

ibuziuk on 9 Jul 2020

@ibuziuk nice!

l0rd on 9 Jul 2020

More reasons to reconsider this as a new standard / use cases for more than one CRW install on the same cluster:

prototyping (trying different settings/configs in parallel, eg., with and without oauth, or with/out external SSO/postgres)
cautious migration (eg., 2.1 and 2.2 in parallel to verify nothing explodes or regresses in the new release)
new customer experiments
QE testing (we only have a small set of available OCP instances – one per version)
OCP and CRW load testing (how many CRW instances can we run on the same cluster before it consumes all the resources and crashes?) – this is handy for validating new architectures too, such as Z and Power

https://issues.redhat.com/browse/CRW-467?focusedCommentId=14209983&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14209983

Also,

is this an upstream or downstream tests constraint?

It's both -- we use the 3 QE OCP instances for testing Che and CRW alike.

nickboldt on 9 Jul 2020

👍1

Moving to milestone 7.19 so we have time to announce deprecation of old behaviour and adapt to this breaking change.

nickboldt on 9 Jul 2020

@nickboldt if there are no blockers/showstoppers during the implementation we are going to enable this for 7.18.0

ibuziuk on 9 Jul 2020

Another use case suggested by Dom on the Community Call:

one cluster contains BOTH a stable deployment (eg., 7.14.x), and
a test deployment (7.16.0-SNAPSHOT)

nickboldt on 13 Jul 2020

thanks @nickboldt 👍 , can we please list the cons as well to this if possible?

SDAdham on 13 Jul 2020

@SDAdham added to the description. Could you clarify the use-case with stable and snapshot (nightly?) deployment a bit more.
How the installation is managed and updated? How the test deployments is used? What config is used for this setup

ibuziuk on 15 Jul 2020

@nickboldt @SDAdham @ibuziuk we should discourage scenarios with 2 versions of CRW (stable and test) installed via the operator on the same cluster because both will share the same CheCluster CRD and that may have some disruptive side effects on the stable instance.

l0rd on 15 Jul 2020

We should continue supporting multiple instances of Che on the same cluster using helm though.

l0rd on 15 Jul 2020

@l0rd just to clarify by helm support of multiple instances, do you mean chectl + helm installer?

ibuziuk on 15 Jul 2020

@SDAdham added to the description. Could you clarify the use-case with stable and snapshot (nightly?) deployment a bit more.
How the installation is managed and updated? How the test deployments is used? What config is used for this setup

I don't see any reason to limit the installations to one instance per cluster. I'll speak for K8S deployments as that's what I'm using che on, but I'm sure the concept should remain the same for openshift, etc...

Use case: Running prod and dev/test environment of Che on the same cluster.

In regards to how-tos, I'm not part of the architecture of che to know the best approach, but I can imagine:

Che is deployed per namespace.
Master node of Che which is on for example Che namespace should be responsible of following up and taking care of the rest of it's own workspaces by managing their namespaces.
Managed deployment like chectl should be keeping a record of Master node (Che instance) as it already does afaik, and the master node should be taking care of it's workspaces created by users.

If on K8S, a namespace can't make changes to another namespaces (i.e master node che running in namespace che cannot make changes to workspaces running in xyz-che namespaces. Then chectl should request for the namespaces to upgrade via the master node

Any custom installation other than managed installation like chectl should be not be supported and not taken care into the consideration for the auto-update procedures.

SDAdham on 16 Jul 2020

@nickboldt @SDAdham @ibuziuk we should discourage scenarios with 2 versions of CRW (stable and test) installed via the operator on the same cluster because both will share the same CheCluster CRD and that may have some disruptive side effects on the stable instance.

What "disruptive side effects" can happen as a result of having multiple instances?

SDAdham on 16 Jul 2020

@SDAdham fields of the CheCluster CRD can be added/removed/updated during an update and that may make one of your 2 CheCluster CR incompatible with the new CRD. In other words, you can isolate your Che instances in 2 different namespaces, but the operator Custom Resource Definition is defined at cluster level: if you update it both Che instances will be affected.

@ibuziuk yes helm charts installed via chectl: no CRD, no cluster privileges required, configurable namespace

l0rd on 16 Jul 2020

+1 for @nickboldt ideas about the need for more than one instance per cluster.
Users with a running CRW instance will often want to 'look at' the upgrade version before adapting it. (Testing the new version, User Acceptance period, etc.) Users would not be happy having to blindly trust us with the new version-- they are going to want a way to install a second CRW/Che version beside the first.
It should be noted that OCP clusters are expensive, some users have only 1 at their disposal. (Also, some users are restricted to just one cluster because of authentication mechanisms, etc.)

RickJWagner on 28 Jul 2020

@l0rd @ibuziuk Is there any chance there will be some "backdoor" option to select different namespace as per Nick's and Rick's suggestions? (It would greatly help QE as well)

rhopp on 25 Aug 2020

As discussed a few weeks back with @RickJWagner a good compromise for users that are cautious about updates is to implement operand rollback in case of unsuccessful upgrade. Hence this issue is blocked by #18043.

l0rd on 5 Oct 2020

I don't see any explicit mention of rollback in https://github.com/eclipse/che/issues/18043

Will that be implemented as a new chectl command too? Seems like it might be useful & logical to have UI features implemented as CLI features too. cc: @tolusha

nickboldt on 2 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

We should use services hostname instead of ingresses for inter-components communications

l0rd · 3Comments

Automatic saves when editing config pages is too frequent and makes Dashboard page unresponsive

dmytro-ndp · 3Comments

Deployment is not cleaned up if a workspace start fails because of pod creation timeout

sleshchenko · 3Comments

Make Kubernetes infrastructure create a special service account for workspaces

sleshchenko · 3Comments

There is an external notification while working with the File Watcher exclude feature

SkorikSergey · 3Comments