Xterm.js: Parts of xterm.js should run in a web worker

Created on 15 Jun 2018  路  26Comments  路  Source: xtermjs/xterm.js

I didn't see an issue tracking this yet.

Blocked by https://github.com/xtermjs/xterm.js/issues/1507
Related: https://github.com/xtermjs/xterm.js/issues/444

areperformance typenhancement

All 26 comments

Any drafts/thoughts yet into this direction? From my limited parser centric view it seems the renderer and the data input + parsing are fighting over cpu time so my first wild guess is to move the all core parts (data input + parser + most of the terminal class) into a webworker and let only the DOM related stuff reside in the main thread. This will require some event translation, but thats already part of the new layered proposal. Tricky part will be to get the buffer data moved to the main thread without much runtime delay (SharedArrayBuffer might solve this in the future without a copy but is currently deactivated in all engines due to Spectre, so we gonna have to do it "the hard way").

My thinking is to move core/base into a worker and then ui/base/public runs in the UI thread. How buffer state is synchronized still needs some thinking.

One idea is to maintain the buffer only in the worker, and then have a more renderer friendly view model in the main thread that can be incrementally updated by the worker.

We could also rewrite core in rust as a WebAssembly module, but I think that is out of scope for now 馃槄

One idea is to maintain the buffer only in the worker, and then have a more renderer friendly view model in the main thread that can be incrementally updated by the worker.

Yeah that would be my preference, I think monaco needs a more complete view of the buffer though.

We could also rewrite core in rust as a WebAssembly module, but I think that is out of scope for now 馃槄

Yeah, seems like quite the effort to undertake 馃槃. Cleaning up the TS beats this out at least for now imo

I gave this some more thoughs, here's a rough overview of how it could work. The UI could subscribe a view and can specify the amount of rows it wants to see, and also the offset of rows to either top or bottom. The core will than track buffer changes that affect that viewport and emit updates with update instructions. The goal should be that the UI does not need to store any state, it should simply consume the updates and draw / redraw based on them.

screen shot 2018-06-20 at 12 20 04

// subscribe to a new view
const view = terminal.createViewport({ rows: 20, offset: 0, sticky: true });

// listen for view updates and modify the UI accordingly
view.onUpdate((changeDecorations /*changes*/, terminalState /*infos*/) => {

  // TODO: 
  // specify how changeDecorations could look like, maybe like this:
  for (let change of changeDecorations) {
     switch (change.type) {
        case ChangeType.DELETE: {
           // some rows have left the viewport
        }
        case ChangeType.ADD: {
           // some rows have been added to the viewport
        }
        case ChangeType.MODIFY: {
           // exstinig viewport rows have been modified
        }
     }
  }
});

// terminal viewport is scrolled in the UI, update the viewport offset
myViewportElement.addEventListener('wheel', (evt) => {
  // calculate the new offset
  let offset = ...

  // NOTE: this will likely trigger the onUpdate callback if things have changed
  view.setOffset({ offset, sticky: offset > 0 });
})

@mofux Hmm not sure about this - in theory the inner terminal core does not need to hold any data of lines, that are beyond the actual terminal height, to work properly. Only exception to this is resizing, which will occur not very often. Maybe we can turn this into an advantage and only hold the "hot parts" of the buffer in the core and delegate the scrollback data buffering to the renderer? It will need a backpropagation for resize events though. I have not thought deeper into this, just some random thoughts so far for me.

@jerch The scrollback has to be maintained somewhere, and I think it is better to have it in the core for this reason:
At some point I want addons like the linkifier to live in the core and subscribe to buffer changes and then create decorations on it. A decoration is like a sticky range (startRow, startColumn, endRow, endColumn) that provides rendering information to the frontend renderer (via the onUpdate listener). Cell styles like background, foreground, bold etc. would also end up as a decoration. This gives us some very big advantages: They only need to be calculated if a hot buffer line is updated. And since these calculations will eventually run in the WebWorker, it will unblock the UI thread.

Maybe this design makes more sense: We only maintain the buffer for the height of the pty, and then we hold our decorations for the pty screen height + scrollback:

screen shot 2018-06-20 at 13 22 28

@mofux Yupp you are right, totally forgot the addons. From an API design perspective it also seems more convenient to hold scrollback data in the offscreen part. What bugs me with the current buffer design is the fact, that we mostly hold rendering info in the buffer for every cell, even for the scrollback data, which cant be accessed by normal "terminal means" anymore (thus wont change anymore, except resize once it should support reflow). This is a very memory hungry layout.

With your second layout sheet we might be able to create some less expensive scrollback buffer (the purple part) that is closer to rendering needs (merged attributes and cells maybe?), while the orange part still gives quick access to all attributes on a cell by cell basis. I am still worried about the amount of data that we have to send between the worker threads to get something new shown.

Also not sure yet, how addons will play into this, some gut feeling tells me that we might have to build different API level entries if we want critical parts to be customizable from outside (like a core, pre and post render API).

One more thought regarding the architecture (and the title of the issue): Instead of just running core in a webworker, it would be nice (and in most cases even more useful) if we consider to run core in a server-side node process (close to where the pty process is), which then talks to the UI (browser) via RPC / websockets. Technically, there shouldn't be much of a difference between communicating with a webworker vs. communicating with a server process, it's only a different transport layer.

screen shot 2018-06-25 at 14 56 04

@mofux Imho this a groundshaking thing, there are several pros and cons for it, what comes into my mind:

pros:

  • unicode versions can be synced more easily between pty and terminal core, affects wcwidth calculation, grapheme and bidi handling (not yet supported)
  • restoring old terminal sessions can be implemented much easier
  • browser memory consumption is no concern anymore
  • frontend part can be really slim, in extreme down to a pixel image as terminal representation (like a screenscraper)

cons:

  • xterm.js depends on a server part, nomore browser only deployment possible unless someone ports the nodejs core part back to browser engines again
  • negative impact on server performance (cpu and mem), this is not an issue for local apps like hyper or vscode, but will hurt remote server apps with 1k+ terminals really bad - note: frontend power is kinda free for a service provider
  • depending on the "stubiness" of the frontend the server-client comm will explode (we gonna have to request and transmit view data over and over for small changes)
  • latency for remote apps will go up
  • Why bothering with a self written core part at all? Reinventing the square wheel? Just take some well established c terminal lib, scrape the output and forward it to the browser with some key and mouse handling there as interaction layer. Done.

Dont take the last point to serious, this is more rant than an objective argument. I still tend to the client core thingy for a simple reason: creating a transport layer for bytestrings is much easier/faster than handling different API states across components with different machine locations, imho. But I might be wrong with that.

@jerch Thanks for your thoughts, really appreciated! Please consider the ideas mentioned above as a testbed for discussion in order to develop a vision for future enhancements.

The main goal I'm seeing is to separate the ui from the core, so we can offload the work performed by the parser to a separate thread, unblocking the UI thread.

The second goal is to shape an interface (contract) that allows core and ui to talk to each other via some kind of IPC (be it worker.postMessage or a WebSocket, or even a traditional direct communication if both are running in the same thread like it is now). Challenge here is to keep the memory footprint minimal (we don't want to maintain the buffer + scrollback twice) and the communication overhead and payload as low as possible. If we get this right, it also opens up the opportunity to support different renderers more easily (e.g. monaco-editor)

xterm.js depends on a server part, nomore browser only deployment possible unless someone ports the nodejs core part back to browser engines again

I think we misunderstood here. I'm thinking of core as being a DOM independent piece of JS code that can run in the Browser, a WebWorker or Node.js without any porting. It should definitely be up to the developer to decide where core should run (depending on his use-case). My thought was to decouple core and ui as much as possible, so we could potentially support all these scenarios.

depending on the "stubiness" of the frontend the server-client comm will explode (we gonna have to request and transmit view data over and over for small changes)

It depends. If we only send incremental updates to the ui this shouldn't be much of a problem.

latency for remote apps will go up

At the moment we are sending all the pty data to the Browser frontend (with the same latency burden), which is really painful in scenarios where xterm.js is running in a browser that is far away from the server. I've seen situations where the data stream from the pty would just spam the websocket so hard that it would eventually disconnect because pings didn't come through anymore. Catching all the data at the server and only sending view updates to the browser (maybe throttled) could improve this situation. The thing is, we can't skip processing parts of the pty data stream because it has to consistently update the buffer - we have to eat it all. But we can skip / throttle updates of a view that reads from a consistent buffer state. And that's where I see the big advantage of running core at the server side. We're not forced to send the whole pty stream to the client anymore, we only send view updates. Commands like ls -lR / that rotate the buffer + scrollback multiple times a frame won't hurt the UI anymore.

I think we misunderstood here. I'm thinking of core as being a DOM independent piece of JS code that can run in the Browser, a WebWorker or Node.js without any porting. It should definitely be up to the developer to decide where core should run (depending on his use-case). My thought was to decouple core and ui as much as possible, so we could potentially support all these scenarios.

:+1:

It depends. If we only send incremental updates to the ui this shouldn't be much of a problem.

Yupp. There is a small _but_ though - if we only send incremental updates the frontend part needs a way to hold the old data and merge the new onto. Oh - and the backend part needs some abstraction to filter updated content from old stuff. Maybe we can establish some buffer cache key thing on row level or even better (for amount of comm) / worse (for runtime + memory usage) on cell level (a real "write-through" cell data thingy).

About the latency / server-client-comm update thing:
This depends much on the granularity of updates we aim for. The sexiness of the current approach is the simplicity - we dont need a special protocol, we just pump the pty bytestream. Once we decide to go for a higher API transport thing we have to build a protocol layer to support this and that. I am not against such an approach, still it is some work to layout it in a way that third party users can integrate it easily into their very different environments.

Last but not least I think possible changes from #791 should be part of the considerations, some of my suggestions there might raise the burden to get easy and cheap updates delivered (esp. my storage & pointer approaches there might end up being contradictional).

Yupp. There is a small but though - if we only send incremental updates the frontend part needs a way to hold the old data and merge the new onto.

The canvas renderer already does this for the viewport:

https://github.com/xtermjs/xterm.js/blob/5620da49d8590efd79ca06e995c89866c239e53e/src/renderer/TextRenderLayer.ts#L21

Closing as out of scope in the interest of keeping the issue list small as this probably won't happen for years if at all.

Reopening this as it would be a nice direction to go and ensure the main thread stays responsive during heavy workloads.

I played with web workers a bit lately and I would imagine it work work by using a shared array buffer if it's supported, and if hand the write buffer data over to the worker (and not persist in the main thread). The main challenge imo is how you're meant to easily get embedders to leverage the workers as there are multiple files required then, xterm.js may also get bundled into a different position.

@Tyriar Some investigations I did in that field:

Shared array buffer (SAB) with own atomic read/write locks is the fastest way to get data "moved" between worker threads. Its still loosing 20-30% for thread sync / locks compared to single threaded promise code moving data around (tested without any workload on the data, so those numbers are just for the data "moving" part). Downside of this solution: hard to get done right (and maintain?), may not work in all browsers due to Spectre security patches.

Normal object transfer is okish and the only valid fallback if SAB + atomics are not available. Runs ~50% slower than a SAB solution, thus might penalize screen update latency (no clue if this will be perceivable in the end).

Since the buffer is made of typed arrays a SAB solution could be easily done. Still a fallback might be needed to cover engines without SABs. Last but not least the worker-mainthread interface will be challenging since we have so many events/callbacks into browser code, that need to be plugged without race conditions.

Yes we'd definitely need a fallback as Safari doesn't have SAB for example. My explorations in workers and SAB is that they're pretty awesome, you just need to think about how fallbacks and packaging would work (2 build targets? rely on dynamic imports? how do embedders bundle?).

Good point with callbacks and events, what exactly we would pull into the worker is not clear cut at all. You can see by zooming into a profile that arguable parsing isn't the expensive bit:

image

So maybe in an ideal world the buffer should also be a shared array buffer owned by the worker thread and only events that indicate explain which ranges changed or something would be sent over to the main thread.

Another thing to think about is how does transferControlToOffscreen fit into all this. It would be extremely cool if we have a renderer thread, a parse/buffer thread and the main thread barely does anything, keeping the application extremely responsive even when heavy parsing/rendering is going on.

It's definitely clear that this would be a pretty radical change though, no way I'll have time to play with this any time soon but it's always fun to talk about. We can think about this as we shape the architecture and maybe do small parts of it first (like having the webgl renderer optionally run in a worker). I also want an issue to point duplicates at for VS Code dropping many frames when the terminal is busy.

Another thing to think about is how does transferControlToOffscreen fit into all this. It would be extremely cool if we have a renderer thread, a parse/buffer thread and the main thread barely does anything, keeping the application extremely responsive even when heavy parsing/rendering is going on.

Indeed. This could even be made lockfree for the normal PRINT action (which in conjunction with Terminal.scroll covers like 95% of heavy terminal activity) if we resort to copy-on-write for a line with hooking the updated line as one genuine atomic action (possible as long as we stick to a single writer - single/multiple reader pattern). Locks will prolly still be needed for other actions like ED, resize and such (things that manipulate more than one line a time).

Yes the webgl renderer seems to be a perfect fit to test out new paths into this direction as only webgl context is currently allowed for the offcanvas. Not even sure if we can gain anything for the DOM renderer here - Would it be faster to pre-construct the DOM strings in a worker and move the content as a big update batch to the mainthread? Idk, never tested anything like that.

Edit:
On code level I wonder if we could get away with a decorator pattern, that "maps" functionality to different thread targets. Something like that:

@bufferWorker
@main
class Terminal ... {
  // not decorated things get spawned on all thread targets listed on the class
  public doXY(...) {...}

  // only in bufferWorker
  @bufferWorker
  public doSomethingOnBuffer(...) {...}
}

It is just a rough idea atm, not even sure if decorators can be mis-used for that type of macro-precompilation stuff. It certainly would introduce another precompile step, still it would make those definitions on code level much easier. Oh well, it also would break with IntelliSense, so might not be a good idea at all. Is there anything in TS to do macrolike precompile tasks?

Since all related issues in vscode repository are closed and they point to this issue, I am going to ask here: is there any workaround for vscode to be able to print long lines without freezing?

This problem sometimes makes vscode unusable, as some packages print their debug information in a sequence of a very long lines.

@arsinclair This is def. a bug in some terminal buffer consuming handler in vscode, xterm.js itself does not have that issue. Imho the right way to fix this would be to address the issue in the slowpoke function.

Hacky workaround:
You could try to identify the slowpoke and remove it from the event. But do this only if you know what you are doing, as it might break vscode terminal integration up to weird unwanted side effects (which might turn out really bad if the terminal is connect to a real system). I dont know the handlers there, thus cannot tell if it is safe to simply remove any of them.

@arsinclair you're probably seeing https://github.com/microsoft/vscode/issues/100338

@Tyriar, I don't think so - there are no URLs in my output. It is just a set of comma separated UUIDs.

@jerch, I'm not sure what is a _slowpoke function_ that you're referring to. Could you elaborate?

@arcanis With slowpoke function I mean some code that eats your precious CPU cycles for non obvious reasons. The linkifier would be a candidate, if you have very long auto wrapping lines. It does not matter if there are any links in your output, identifying them itself may take pretty long.

What about it?!

if you have very long auto wrapping lines

Yes, this is exactly the case.

It does not matter if there are any links in your output, identifying them itself may take pretty long

The problem is, once I scroll to the long lines block, the CPU load jumps to 100% and it never goes down. I've tried waiting for several hours to see if it ever finishes.

I am neither xterm.js, nor vscode developer, so I am not familiar with the codebase and I don't know why it is happening and how to fix it.
Since I didn't have time to deal with it, I just redirected all output to a .log file and then was reading that file separately.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

parisk picture parisk  路  3Comments

fabiospampinato picture fabiospampinato  路  4Comments

Tyriar picture Tyriar  路  4Comments

Tyriar picture Tyriar  路  4Comments

Tyriar picture Tyriar  路  4Comments