Che: Restart workspace fail (OpenShift Dedicated, minishift)

Created on 1 Apr 2020  路  25Comments  路  Source: eclipse/che

Describe the bug

Got error message during restarting early created workspaces. This reproduced not each time need to repeat restarting several times.

Che version

  • [ ] latest
  • [X] nightly - 7.11.0-SNAPSHOT
  • [ ] other: please specify

Steps to reproduce

  1. Create workspace
  2. Stop workspace
  3. Try to restart it (probably need to repeat several times)

Got this error log:

Error: Failed to run the workspace: "Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/eclipse-che/secrets. Message: object is being deleted: secrets "workspacetim1cx2edjv0re97-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspacetim1cx2edjv0re97-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspacetim1cx2edjv0re97-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={})."

Expected behavior

Runtime

  • [ ] kubernetes (include output of kubectl version)
  • [x] Openshift (include output of oc version)
  • [ ] minikube (include output of minikube version and kubectl version)
  • [ ] minishift (include output of minishift version and oc version)
  • [ ] docker-desktop + K8S (include output of docker version and kubectl version)
  • [ ] other: (please specify)
oc version

Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0+b4261e0", GitCommit:"b4261e07ed", GitTreeState:"clean", BuildDate:"2019-07-06T03:16:01Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.2", GitCommit:"4320e48", GitTreeState:"clean", BuildDate:"2020-01-21T19:50:59Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Screenshots

error-log

Installation method

  • [ ] chectl
  • [ ] che-operator
  • [ ] minishift-addon
  • [ ] I don't know
  • [x] oc apply for che-server, plugin and devfile registry

Environment

  • [ ] my computer

    • [ ] Windows

    • [ ] Linux

    • [ ] macOS

  • [ ] Cloud

    • [ ] Amazon

    • [ ] Azure

    • [ ] GCE

    • [x] other (OCD che-dev Cluster)

  • [ ] other: please specify

Eclipse Che Logs


che-2-5kqgc-che.log

Additional context

Could not reproduce before this commit: https://github.com/eclipse/che/commit/db46ad4979f6a82ac2278b410e73659c6463d676

arewsmaster kinbug severitP1

All 25 comments

@vparfonov could you clarify how you deployed eclipse che on OSD?

Reproduced on multi-user installation with original old fashion deploy_che.sh script.
Steps to reproduce:

  1. checkout to commit before removing deploy_che.sh script (PR):
    git checkout 59f9e41be62586c174d10da0485e3ba66588ef33
  2. cd deploy/openshift
  3. ./deploy_che.sh --mutliuser
  4. Update DeploymentConfig to allow start more than one workspace:
- name: CHE_LIMITS_USER_WORKSPACES_RUN_COUNT
  value: '-1'
  1. Create and start two workspaces.
  2. Stop both.
  3. Start one of workspaces and until it running start other workspace

multiuser

che-3-pdh86-che.log

@ericwill any ideas? looks like an issue with ssh plugin cc: @vinokurig

Looks like it is related to https://github.com/eclipse/che/pull/14950, @vzhukovskii any ideas?

Will take a look asap

FYI, I reproduced it on minishift

_minishift v1.34.2+83ebaab_

As far as I understand, changes related to storing private keys in secrets should not broke the flow of starting two dedicated workspaces. What situation with the only one workspace, does it start successfully?

Looks like single workspace starts successfully

I suppose, then, it shouldn't be related to #14950

More logs.

2020-04-03 09:51:09,540[ceSharedPool-26]  [ERROR] [.i.k.KubernetesInternalRuntime 259]  - Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/skabashn/secrets. Message: object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspaceyy5nesnxw954tsbz-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInfrastructureException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/skabashn/secrets. Message: object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspaceyy5nesnxw954tsbz-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
    at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesSecrets.create(KubernetesSecrets.java:51)
    at org.eclipse.che.workspace.infrastructure.openshift.OpenShiftInternalRuntime.createSecrets(OpenShiftInternalRuntime.java:127)
    at org.eclipse.che.workspace.infrastructure.openshift.OpenShiftInternalRuntime.startMachines(OpenShiftInternalRuntime.java:112)
    at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStart(KubernetesInternalRuntime.java:222)
    at org.eclipse.che.api.workspace.server.spi.InternalRuntime.start(InternalRuntime.java:141)
    at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartRuntimeTask.run(WorkspaceRuntimes.java:920)
    at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:38)
    at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/skabashn/secrets. Message: object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspaceyy5nesnxw954tsbz-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:251)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:815)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:333)
    at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesSecrets.create(KubernetesSecrets.java:49)
    ... 10 common frames omitted

Might be related to
https://github.com/fabric8io/kubernetes-client/issues/1775
https://github.com/fabric8io/kubernetes-client/issues/1840
See also
https://github.com/strimzi/strimzi-kafka-operator/issues/2223

It is not related to the SSH plugin as well, because the plugin hadn't been started before the error.

@skabashnyuk so this is likely related to the k8s client lib update right?

@skabashnyuk so this is likely related to the k8s client lib update right?

I don't know yet.

Deploy script is deprecated and removed.
Any issues caused by that script make no sense.

@tolusha the custom script was used for OSD deployment (there is no way to deploy on OSD any other way atm)
@vparfonov have you reproduced the issue on minishift via chectl ?

for the record, it looks like I can not reproduce on Hosted Che against 7.11 Snapshot https://github.com/redhat-developer/rh-che/pull/1828

@tolusha @ibuziuk yes have reproduced it with chectl and minishift

I faced the same issue today on the minikube + chectl + operator, but the error message was about self-signed-cert secret.

I can't reproduce this issue anymore when I use an image from https://github.com/eclipse/che/pull/16540

Me to, works well with #16540

I can't reproduce on minishift with chectl and operator...

k8s client 4.9.0

2020-04-06 08:21:14,766[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> DELETE https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json; charset=utf-8
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 66
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Authorization: Bearer XXXX
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - {"apiVersion":"v1","kind":"DeleteOptions","orphanDependents":true}
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> END DELETE (66-byte body)
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - <-- 200 OK https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys (7ms)
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Audit-Id: 978dfc40-0d59-47c4-81ed-97694e28c2da
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Cache-Control: no-cache, private
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json
2020-04-06 08:21:14,776[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Date: Mon, 06 Apr 2020 08:21:14 GMT
2020-04-06 08:21:14,776[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Transfer-Encoding: chunked
2020-04-06 08:21:14,776[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 

kubernetes client 4.1.0

2020-04-06 09:07:15,240[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> DELETE https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys
2020-04-06 09:07:15,240[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json; charset=utf-8
2020-04-06 09:07:15,240[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 67
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Authorization: XXXX
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - {"apiVersion":"v1","kind":"DeleteOptions","orphanDependents":false}
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> END DELETE (67-byte body)
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - <-- 200 OK https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys (9ms)
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Audit-Id: fd8fcae6-e6e2-4934-8643-001e73fa35ee
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Cache-Control: no-cache, private
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Date: Mon, 06 Apr 2020 09:07:15 GMT
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 193

See more https://github.com/fabric8io/kubernetes-client/issues/1840

PR https://github.com/eclipse/che/pull/16540

2020-04-06 09:24:52,420[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> DELETE https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys
2020-04-06 09:24:52,420[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json; charset=utf-8
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 75
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Authorization: Bearer xx
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - {"apiVersion":"v1","kind":"DeleteOptions","propagationPolicy":"Foreground"}
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> END DELETE (75-byte body)
2020-04-06 09:24:52,429[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - <-- 200 OK https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys (8ms)
2020-04-06 09:24:52,429[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Audit-Id: 1ec0c9fe-40d3-4e94-880a-8ffd3904e6d2
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Cache-Control: no-cache, private
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Date: Mon, 06 Apr 2020 09:24:52 GMT
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Transfer-Encoding: chunked

@skabashnyuk am I correct that this issue is not severe e.g. even if the initial restart failed one of the subsequent should pass once the k8s-resources are finally deleted? So far we were not able to reproduce it on staging against 7.11.0 upstream and do not treat it as a blocker for production update

am I correct that this issue is not severe e.g. even if the initial restart failed one of the subsequent should pass once the k8s-resources are finally deleted

I would say it could. Without any guarantee of course.

thanks, so our plan is to promote 7.11.0 and request 7.11.1 if this issue will become a problem. On the staging, we were not able to reproduce the issue so far https://github.com/eclipse/che/issues/16475#issuecomment-610924205

Was this page helpful?
0 / 5 - 0 ratings