Che: If async storage pod gets evicted, workspaces cannot be stopped or restarted.

Created on 11 Sep 2020  路  6Comments  路  Source: eclipse/che

Edit: Looking at what limited information I can get from Hosted Che, this may be semi-unrelated to async-storage. The container in the stuck terminating workspace pod that won't shut down is the che-docs antora container, which was running a gulp task. The rsync pod seems to have terminated successfully.

Describe the bug

While using a workspace with asynchronous storage enabled on Hosted Che, the async storage pod was evicted from the cluster. After this happened, the pod was not re-created and workspace stop hangs. Eventually it seems like stop times out, and I get a red icon suggesting stop failed in some way, with the message ERROR on hover. Afterwards

  • Restarting the workspace is not possible; immediately after metadata broker completes, the message

Error: Failed to run the workspace: "Unable to start the workspace 'workspacewcbpmj0robtm3x3a' due to an internal inconsistency while composing the workspace runtime.Please report a bug. If possible, include the details from Che devfile and server log in bug report (your admin can help with that)"

is logged

  • The workspace pod is stuck in a terminating state (presumably attempting to sync with the rsync server that was evicted).
  • Async pod was evicted with reason

The node was low on resource: ephemeral-storage. Container async-storagev5xh9xb3 was using 276Ki, which exceeds its request of 0.

Before workspace was stopped, Theia showed a tooltip stating that git checkout failed due to quota, but I wasn't able to get more information on this.

Che version

  • [x] latest

Steps to reproduce

  1. Create workspace with async storage enabled
  2. Force eviction of async pod (somehow?)
  3. Try to stop workspace

Runtime

  • [x] Openshift v3.11.82

Environment details

  • Namespace: amisevsk-che on Hosted Che to see terminating pods and evicted async storage
arehosted-che kinbug severitP1 teahosted-che

All 6 comments

I do suspect that the issue should be fixed with https://github.com/eclipse/che/issues/17616

@vparfonov could you please investigate

@ibuziuk Yeah having looked at it a bit more, it looks like the problem was moreso a very long (>25 minute) terminating state on the workspace pod. I'll update this issue if I encounter it again.

I do suspect that the issue should be fixed with #17616

Did it help?

I haven't had a chance to check again, but #17616 does look like it would fix the problem.

It will be on prod only with 7.19.x version

Was this page helpful?
0 / 5 - 0 ratings

Related issues

luckymore0520 picture luckymore0520  路  3Comments

JamesDrummond picture JamesDrummond  路  3Comments

sudheerherle picture sudheerherle  路  3Comments

sleshchenko picture sleshchenko  路  3Comments

InterestedInTechAndCake picture InterestedInTechAndCake  路  3Comments