Che: WS not saving on server_che stop command

Created on 6 Jan 2017  路  26Comments  路  Source: eclipse/che

When I stop the server with stop command, the WS that are still running and supposed to be saved and shutdown, do not keep their current state.

Reproduction Steps:

  1. Start server and a WS
  2. On WS apply a modification, for example edit .bashrc
  3. Stop server (don't shutdown the WS)
  4. Start the server and start the WS
  5. Watch your changes

Expected behavior:

The file should still have the modification

Observed behavior:

The WS loses the modification

Che version: nightly
OS and version: Boot2docker + win7
Docker version: 1.12.3, build 6b644ec
Che install: Docker container

Additional information:

  • Problem started happening recently, didn't happen in an older version of Che: yes
  • Problem can be reliably reproduced, doesn't happen randomly: no
kinquestion statuanalyzing teaplatform

All 26 comments

@cifren how do you stop the server? Just kill a container?

I use this command line

docker run --rm -t $ECLIPSE_OPTIONS eclipse/che:$ECLIPSE_VERSION stop

It is boot2docker - so what are the values of all options? This particular vm requires mounting in a special folder otherwise your data is lost.

@cifren @TylerJewell this is expected behavior. If you think it is not, label it as bug.

The workspace is there, so it's not a mount issue. When che-server is stopped, we do not auto snapshot running workspaces.

All my options

ECLIPSE_VERSION="nightly"
ECLIPSE_OPTIONS="-v /var/run/docker.sock:/var/run/docker.sock -v /c/Users/webide/che_nightly:/data"

Ok I would have thought that would have been the normal behavior for the server to save all WS and after shutdown the server... Or maybe a warning during the shutdown ?

@cifren currently when server is stopped, workspaces are stopped without snapshotting.

I thought we changed the default stop behavior in m3 to always auto snap? Are we saying that only happens if the user initiates the save not if the server is stopped? If so then I think we should change the default server config to follow the user config.

@TylerJewell We snapshot the workspace if a workspace is stopped by a user and che properties tell server to do so. If a server is stopped, we do not snapshot running workspaces, just stop them.

I consider this a bug as it is not expected behavior. The server should follow default behaviors of users. So if we snap when user stops then server stops should follow same path unless admin overrides it.

@TylerJewell please label it as a bug then and assign to platform team.

@skabashnyuk - it is reasonably expected that admins will expect server shutdown to stop all workspaces and for those workspaces to follow the default auto-snapshot setting within Che. There would then be a che.env property that would override this behavior so that a server stop would not necessarily follow the user configured action.

@TylerJewell you want to snapshot workspaces during tomcat stop if it's configured in che.env? BTW tomcat has time limit to stop 3 minutes. After that it will be just kill -9

Sorry, i could have written a better description.f

As a default behavior, when the Che (or Codenvy) server is stopped, we should respect the values set by the admin within che.env with the value CHE_WORKSPACE_AUTO__SNAPSHOT. If this value is true then a normal shutdown of Che server would be to stop each workspace and snapshot them. If false then it would be the behavior we have now.

We should allow admins a new variable, CHE_SHUTDOWN_SNAPSHOT_WORKSPACES=true/false which can ignore the other setting and provide an explicit behavior.

For che.env file:

# Che Server Auto Snapshot Override
#    When stopping the Che server, we will stop each workspace individually. These workspaces
#    may or may not be snapshot during their stop process. Default behavior is to follow the
#    setting applied to CHE_WORKSPACE_AUTO__SNAPSHOT, which is the auto behavior for end
#    users. You can set this value to override inheritance and explicitly define whether workspaces
#    should be snapshotted. Snapshotting is a time and resource intensive activity, so server 
#    shutdown can be slow if you have many running workspaces.
#CHE_SHUTDOWN_SNAPSHOT_WORKSPACES=true/false

FYI It's potentially dangerous because we can't guarantee that all workspaces will be stopped+snapshotted in 3 minutes

Or we can relay in CLI. Before stop it will stop+snapshot all workspaces and then call ws-master tomcat stop.

Can you elaborate on where the 3 minute limitation came from? I woudl think that there is a shutdown procedure within Tomcat that can take as long as it requires.

Basically we can't distinct very long stop and failed stop. To avoid situation when we can't stop tomcat we use stop method with timeout. After X minutes if tomcat process hasn't stopped it would be kill -9

That explanation is clear. Maybe then we write it as a different specification (as you suggest) and we have cmd_stop() and cmd_restart() respect the the values in che.env. The downside of this scenario is that while eclipse/che-cli woudl repsect this logic, eclipse/che-server would not. So it seems unnatural.

How do we handle it with Codenvy On-Prem where we have distributed services managing the Codenvy tomcat?

it's behaves the same as che.

The more I think about this, the riskier I think it will be to handle such scenarios from within the CLI. In a Codenvy distributed scenario, or with Eclipse Che deployed on OpenShift, a "shutdown" even would have to be contained within the system itself. So we would have to develop a way to detect these events and then have various shutdown policies that the Che or Codenvy server respects, such as "time-based shutdown" or "wait for confirmation that all running workspaces have successfully stopped".

@cifren - we are struggling with this issue and how to categorize it. Can you share a little bit about the scenario where you are physically shutting down a che server and where all of the workspaces would need to have their workspaces stopped and snapshotted?

The separation of concerns here is around the physical Che server, which is expected to always be running, and that a shut-down event for the server would be an unusual and rare event. And if the workpaces themselves are left in a running state, if you just needed to recycle the server, the server would reconnect to those workspaces that were left running.

So we need to think more about the scnearios where you want a physical shutdown of the server and then for an orderly shut down of all workspaces from within it. So understanding your thinking will help here.

My case is maybe special, because I use a laptop and on that laptop I build my che-server. I shutdown my computer every night, ecology first.
But I suppose in your case, that depends if you want to use it as a server (really rare shutdown) or on a laptop (shutdown every night).

The best would either

  • easy solution: Display a warning and an option --force if something is running
  • less easy: Create a UI to shutdown the server and see the progression, so the user can manage it easily.

That is how I see it, I hope it helped you.

Your scenario of being ecofriendly is honorable!

I think about the situations where your laptop goes into standby or shutdown, and in those scenarios there is usually some sort of rapid orderly shutdown which includes saving certain files and then closing down running processes.

So making sure that project files are saved in long term storage is necessary but snapshotting the workspace run time has a heavy tax. It makes me ponder whether an option that periodically snaps the runtime (if necessary) on a timer would be sensible. In the final shutdown it could then take the rapid path on the assumption the system has a background snapshot engine tuned by the admin.

I surely don't see the whole project, I think the normal behavior of docker container is to save the image state, why do you have to create a snapshot if the container is built ?

I saw that the che-server was creating an image for each workspace, without using really the container capabilities. I understand creating a snapshot is something useful, but it takes time.

The containers are not suppose to be erased easily, unless you want to loose your image state.

I can be wrong, I am not a docker expert.

The process of creating an image from a container is a fairly expensive CPU and I/O activity, and also consumes a lot of disk space after it has been created. So, as we think about building Codenvy and Che farms that have 100s of thousands of workspaces at the same time, savings from every operation is something that we have to naturally consider. If we have a natural stop process that also requires the orderly shutdown of all workspaces, and we are dealing with 1000s it could turn into a problem process.

Ignoring that for a moment - what we do is that we take everything that is in /projects and save that to long term storage. This is all of your project code. If recovering your work only requires reactivation of your project code, then you do not need a snapshot.

The snapshot process is for users who have changed the state of their workspace. Say you have installed some utilities with npm, or perhaps you have a database installed and then you save data into that database. In this situation, these files are not part of your /projects folder - they are also not updated as commonly - so in this situation if you want the workspace to have its state saved, then a snapshot makes sense.

Many developers are not fundamentally modifying the workspace runtime on every session. It's an occassional activity. So doing snaps periodically or forcibly when they were not needed is tricky becuase it offers guarantees of state, but at the expense of heavy operations.

5.2 shipped with a graceful shutdown in stop. This improvement was driven by this issue.

Was this page helpful?
0 / 5 - 0 ratings