server.sh and related .vscode-remote/bin/.../out/remoteExtensionHostAgent.js --port=0 do not terminate when VSCode is closed.
Steps to Reproduce:
Does this issue occur when you try this locally?: Yes
I just experienced this same problem:
.vscode-remote/bin/daf71423252a707b8e396e8afa8102b717f8213b/server.sh --port=0 did not terminate and the server basically ran out of memory after a while.
Killing the process caused Ubuntu to recover normal operation.
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
do not terminate when VSCode is closed.
This is expected. The agent will stay running. If you want to clean up everything that the ssh extension installs, you can run the "Uninstall VS Code Server from host" command to kill these processes and remove their code.
did not terminate and the server basically ran out of memory after a while.
The server shouldn't be doing anything when you aren't connected so I'd like to hear more details about that
I don't know the specifics of what was going on but the development ec2 instance ground to a crawl until I killed the server.sh after barely being able to ssh in to do so.
What information will be most helpful to narrow it down?
Can you use top to figure out whether some process is using lots of CPU, and which one it is? The Agent process itself, an extension host, or a process related to one of your extensions?
Does this just happen when vscode is open or even when vscode is not open anymore?
This is expected. The agent will stay running. If you want to clean up everything that the ssh extension installs, you can run the "Uninstall VS Code Server from host" command to kill these processes and remove their code.
Thank you for the clarification. Yet, vscode-remote sometimes runs and keep running more server.sh --- this probably relates to extension / vscode-insider update? The output of htop shows three server.sh running while I am connected with only one instance of vscode-insider.
~bash
61301 psvoboda 20 0 110M 1408 1220 S 0.00 B/s 0.00 B/s 0.0 0.0 0:00.00 鈹溾攢 sh /home/psvoboda/.vscode-remote/bin/473af338e1bd9ad4d9853933da1cd9d5d9e07dc9/server.sh --port=0
57099 psvoboda 20 0 110M 1436 1240 S 0.00 B/s 0.00 B/s 0.0 0.0 0:00.00 鈹溾攢 sh /home/psvoboda/.vscode-remote/bin/daf71423252a707b8e396e8afa8102b717f8213b/server.sh --port=0
21747 psvoboda 20 0 110M 1440 1240 S 0.00 B/s 0.00 B/s 0.0 0.0 0:00.00 鈹溾攢 sh /home/psvoboda/.vscode-remote/bin/693a13cd32c5be798051edc0cb43e1e39fc456d9/server.sh --port=0
~
Yes there is a new one for each vscode update. We can try to clean them up when updating.
I missed logging what the process was but I had a vscode process clocking 200% cpu and 20% mem in top. It did not kill itself when I closed vscode. I am keeping a way closer eye on it now and when it happens again I will make sure to get full details. Possibly extension related.
Thanks. Please open a new issue when you see it again since this is not what the OP is asking about.
We also experience problems with this remote agent staying alive after the session is closed. 5 developers ssh into our build machine and we couldn't build anymore because server run out of memory.
We freed up approximately 5 Gb of RAM after manually killing all the processes.
Can an option be added to the remote development, so that VS will cleanup everything? I don't see any impact to the development with both remote agent alive or not present...
@veaceslav I think the agent processes should not take that much memory since all they do is listen on a port and launch the extension host. Seems possible that there were some actual extension host processes and language servers around (like from a vscode remote window which is still connected).
When I establish a remote connection, then close VS Code, htop shows the following still running on the remote machine:
VIRT RES SHR S CPU% MEM% TIME+ Command
13956 2528 2264 S 0.0 0.0 0:00.00 `- sh /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/server.sh --port=0
1188M 44136 25852 S 0.0 0.0 0:00.74 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.00 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.00 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.00 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.00 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.00 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.00 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.05 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.03 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.04 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
1188M 44136 25852 S 0.0 0.0 0:00.03 | `- /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/node /home/my-user-name/.vscode-remote/bin/5e1f9440d280b0b04cba843df6a0032bf517a2f0/out/remoteExtensionHostAgent.js --port=0
So I think those remoteExtensionHostAgent.js processes shouldn't be there? The only extension that I have enabled is Microsoft's C/C++ extension ... so maybe the problem is there?
Those are not actually multiple processes, htop shows you the threads in one process.
Okay, but this one remoteExtensionHostAgent.js process shouldn't be there, right?
For now it does stay running to wait for new requests, but we should have them shut themselves down after x hours of inactivity.
I am seeing similar issues.
Sometimes there are child processes left behind (see my comment on #276). I suspect this may happen when vscode exits uncleanly. I have a VM running on my laptop where I do the remote development, and I have found that when I standby / resume with vscode running, it often crashes and/or loses contact with the remote server. Maybe when code exits uncleanly it does not signal the remote server to shut down child processes?
An improvement suggestion would be to provide a configuration option such that the remote extension server exits automatically when the vscode client instance disconnects, rather than waiting around for a new connection. I presume the reason for leaving it running is to make startup quicker, but it should probably be up to the user to decide on the tradeoff between the convenience of quick startup vs the hassle of having to manually kill off all these servers. Even better would be a configurable timeout, with a timeout of 0 meaning exit immediately.
Looking into this for June.
When the remote server has no extension hosts active, it should shut itself down after a short timeout. The problem is, when a vscode window is connecting it has to figure out whether a server is already running or whether it needs to start a new one. You could have a race where vscode decides that the server is already running just as the server is shutting down.
There is a short gap between checking that the server is running and asking it to create an extension host process, so vscode needs to be able to tell the server that it is about to create an EH. It can create a file as a lock, and the remote server will not shut down when that file is present. But when does that file get cleared? What if two windows are connecting at the same time? What if a window starts connecting but then fails partway through and leaves the lock file?
The remote server can have a timeout to shut itself down. If that timeout expires, and the lockfile is present, it will delete the lockfile and extend the timeout by, say, a minute. That gives vscode a reasonable amount of time to connect. Multiple windows will not interfere with each other, and a failed connection won't break it.
Secondly, we have to clean up the installed bits on the remote. We can look at the modified dates to guess which are the most recent bits, and when a vscode window connects, it can delete all but the N most recent builds, first checking whether a server for that build is running. But there is a similar problem where we could delete a build just as another vscode instance is trying to connect and launch that build.
We have a lock around the install script to prevent multiple vscode windows from stepping on each other during installation but currently this is only for vscode windows at the same commit. So when vscode is trying to delete the installed commit X, it can wait on the lock for commit X, and once it has acquired the lock, check again whether a server is running for that commit, and if not, it is safe to delete that build. If another window tries to connect later for that build, it will be reinstalled.
I'm also experiencing this problem.
When I start an SSH remote development session to my Ubuntu Azure server, it starts up:
Each process consumes about:
So _each session uses ~341MB of resident memory._ When I close VSCode, all those processes remain in memory.
I too have trouble SSHing into my server after a few of these orphan sessions accumulate. I had to upgrade my Azure server (good for Microsoft, no so much for us), just to make sure there is some RAM headroom.
Also, I tried the Remote-SSH: Kill VS Code Server command, and it doesn't do anything to all the orphaned processes. A periodic killall mono seems to do the trick.
@Britvich please open a new issue for this. The extension host should be shut down when the vscode window is closed cleanly, along with any processes spawned by extensions like mono. But not the node extension host agent or the server.sh process. Let's figure out whether mono was orphaned when the EH shut down or whether the EH is still alive.
@roblourens As requested, I have opened #640
My strategy in https://github.com/microsoft/vscode-remote-release/issues/203#issuecomment-500002309 is still not perfect, there could be a case where the server has already decided to shut itself down, but the SSH extension runs ps to check whether it is running and it still is returned by ps. So the resolver wants to connect to that running server, but milliseconds later, it has shut down. This is a very narrow slice of time but it's technically possible. I can't decide whether it's ok to ignore this or not - it might be worse on slower systems.
I think there are two solutions that would be reliable. We can use an actual locking mechanism between the server and the extension to ensure that the server will not be shutting down at the same time that the extension is trying to connect. There are a couple node modules for working with lock files, I don't know how good they are.
Or we could have the resolver actually connect to the server to tell it that it is about to request an EH. Then the server can acknowledge that it received the request and will stay open for a bit longer. We could ship another script alongside server.sh which does this - connects to the running server and asks it to keep running.
Hey @chrmarti @aeschli, do the container and WSL extensions want this? tl;dr, there are two pieces, having servers shut themselves down when there are no EH remaining, and deleting the installed old versions.
The first part will be mainly implemented by the server and remote extensions will opt-in to it when they start the server.
The second part would probably be implemented entirely by the remote extension.
And I'm interested if you have any feedback on what I've written about this.
I'd love to give feedback on the proposed solution, but I don't know enough about the existing startup process. Is there anywhere this is described? This repo does not seem to contain anything (source or documents).
On the surface, it seems brittle that you would be checking if it is running using ps and then connect - why not try to connect first, then if that fails you would know it is not running or was in the process of shutting down, so you could start another instance.
Yes, the code for remote extensions is not open source.
I'm actually going for basically what you describe, having a way to connect to the server to verify that it isn't in the process of shutting down.
Remote Containers would benefit from having a way to tell when it is holding on to the last remote EH. It would in that case stop the container when shutting down. (Currently this is implemented using ps.)
WSL2 would also benefit from a timeout to shutdown the server when the last EH has been disconnected.
Remote Containers would benefit from having a way to tell when it is holding on to the last remote EH. It would in that case stop the container when shutting down
The server could have an endpoint that returns the number of connected EHs. Would that help? But we would have to think about a race condition where one window is closing just as another is connecting. And not stopping the container when one window is just reloading.
Resolvers have to do two things to use this: pass --enable-remote-auto-shutdown to server.sh when launching the server, then hit the /delay-shutdown server endpoint when connecting, to make sure the shutdown timeout will extend over the period between when the resolver runs and when vscode actually connects and requests an EH.
Forking the second part to delete installed servers https://github.com/microsoft/vscode-remote-release/issues/726
The server could have an endpoint that returns the number of connected EHs. Would that help? But we would have to think about a race condition where one window is closing just as another is connecting. And not stopping the container when one window is just reloading.
Makes sense. We could add a flag (or count) to the response indicating if one or more /delay-shutdown are active - if so, the assumption would be that another window is about to connect and the container should continue running.
Most helpful comment
I am seeing similar issues.
Sometimes there are child processes left behind (see my comment on #276). I suspect this may happen when vscode exits uncleanly. I have a VM running on my laptop where I do the remote development, and I have found that when I standby / resume with vscode running, it often crashes and/or loses contact with the remote server. Maybe when code exits uncleanly it does not signal the remote server to shut down child processes?
An improvement suggestion would be to provide a configuration option such that the remote extension server exits automatically when the vscode client instance disconnects, rather than waiting around for a new connection. I presume the reason for leaving it running is to make startup quicker, but it should probably be up to the user to decide on the tradeoff between the convenience of quick startup vs the hassle of having to manually kill off all these servers. Even better would be a configurable timeout, with a timeout of 0 meaning exit immediately.