I've got a relatively large VS Code extension (~30 MB) that I'm trying to load into my Che workspace. You can find the meta.yaml for the associated Che plugin here
We're finding that our internal OCP4 clusters have a relatively slow connection to the download server where the extension is hosted (http://download.eclipse.org/), and thus the plugin brokering sometimes times out. So we've set CHE_WORKSPACE_PLUGIN__BROKER_WAIT__TIMEOUT__MIN to a larger value (like 15 minutes), to prevent this.
However, we're still finding that plugin brokering is still occasionally failing even after setting a larger timeout, but this time because the Websocket appears to get disconnected and then the plugin broker crashes. We see the following in the logs for the plugin broker:
2019/11/26 16:44:23 Copying VS Code plugin ''
2019/11/26 16:44:23 Copying VS Code extension archive from '/tmp/vscode-extension-broker782205207/codewind-theia.vsix' to '/plugins/eclipse.codewind-plugin.latest.oapumhxgpi.codewind-theia.vsix' for plugin ''
2019/11/26 16:44:23 Trying to send event of type 'broker/log' to closed tunnel 'tunnel-1'
Which corresponds to this line in the plugin broker calling log.Fatal (which causes the plugin broker to exit):
https://github.com/eclipse/che-plugin-broker/blob/21952b6098bd8edab883c290561f6e1cd08d22da/common/connect.go#L35
log.Fatalf("Trying to send event of type '%s' to closed tunnel '%s'", e.Type(), tb.tunnel.ID())
Should the WebSocket attempt to reconnect here instead rather than crashing? Is there anything that can be done to prevent the WebSocket from disconnecting?
@johnmcollier a possible workaround for your issue: using an offline plugin-registry.
@amisevsk I am setting severity/P2 because I think that you are in the middle of a refactoring of the plugin broker and it may address this problem (hence there is no real need to add this issue to next sprint backlog). But I may be wrong and we may need a P1 here to make sure that it gets included in the next sprints.
I haven't encountered this one but will look into it.
Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.
Mark the issue as fresh with /remove-lifecycle stale in a new comment.
If this issue is safe to close now please do so.
Moderators: Add lifecycle/frozen label to avoid stale mode.
Most helpful comment
@johnmcollier a possible workaround for your issue: using an offline plugin-registry.
@amisevsk I am setting severity/P2 because I think that you are in the middle of a refactoring of the plugin broker and it may address this problem (hence there is no real need to add this issue to next sprint backlog). But I may be wrong and we may need a P1 here to make sure that it gets included in the next sprints.