Tfjs: Catching "CONTEXT_LOST_WEBGL"

Created on 3 Oct 2018  路  11Comments  路  Source: tensorflow/tfjs

Hello everyone,

It seems like gl.CONTEXT_LOST_WEBGL is being catched by the engine and then logged with console.warn. Is there any way for me to catch this error?

Usecase: When training models using the WebGL backend, it would be possible to catch the above mentioned error and save a backup of the current model weights, up to that point.

With the current approach of logging a warning to the console, you wouldn't notice that error and simply keep on training garbage, which deforms the model weights and makes them unusable. Seems like already in the next iteration the model weights are unusable.

core

Most helpful comment

Thanks @justadudewhohacks. This seems very reasonable and useful. We can allow the user to subscribe and be notified:

tf.webgl.onContextLost(() => {
  // Called when the context was lost.
});

https://www.khronos.org/webgl/wiki/HandlingContextLost gives us guidelines how to detect when the context was lost. (We might even be able to fully recover from it, but that would be a future step).

All 11 comments

@dsmilkov is this something we could surface? Possibly as an option?

Thanks @justadudewhohacks. This seems very reasonable and useful. We can allow the user to subscribe and be notified:

tf.webgl.onContextLost(() => {
  // Called when the context was lost.
});

https://www.khronos.org/webgl/wiki/HandlingContextLost gives us guidelines how to detect when the context was lost. (We might even be able to fully recover from it, but that would be a future step).

@justadudewhohacks if this works you have made my day.

Thanks for the quick answer. @dsmilkov just to clarify, simply adding a event listener for 'webglcontextlost' to the canvas passed to tf.fromPixels probably won't make it, or should that already solve it? PS: I am not familar with WebGL at all.

Edit: Okay, turns out it's not as easy as adding an event listener to the input canvas. Also my initial guess:

It seems like gl.CONTEXT_LOST_WEBGL is being catched by the engine and then logged with console.warn.

is wrong, isn't it, the warning doesn't actually come from tfjs-core, right?

Here is where we catch and throw:
https://github.com/tensorflow/tfjs-core/blob/master/src/kernels/webgl/webgl_util.ts#L68

What we could do is add a method to webgl_util.ts, "onContextLost" which sets a global function. When we throw an error of the type CONTEXT_LOST_WEBGL, we call that function. You'll have to import the onContextLost method directly inside webgl.ts where we have all our webgl types, so it's not exported under tf.webgl.webgl_util.

Happy to take a PR :)

Just to clarify about the canvas, tfjs uses its own internal canvas (link), different from the user-provided canvas provided when calling tf.fromPixels(userCanvas). So the event listener will be attached to the internal canvas.

Ok, thanks for the hints. I will try to get this working and then submit a PR.

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

I think we could still do this, WDYT @justadudewhohacks ?

@nsthorat last time I tried to reproduce this issue by allocating tensors of different sizes with a newer version of tfjs-core, I recieved different error message from the WebGL backend, other than CONTEXT_LOST_WEBGL, so I didn't really investigate any further in implementing a hook for catching CONTEXT_LOST_WEBGL.

Great! We actually had a fix recently which should dramatically reduce the context being lost (having a global singleton canvas).

I'll close this out -- if you see this again let us know and we'll reopen it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rlexa picture rlexa  路  3Comments

Josef-Haupt picture Josef-Haupt  路  3Comments

dhrumil83 picture dhrumil83  路  3Comments

take-kuma picture take-kuma  路  3Comments

ritikrishu picture ritikrishu  路  4Comments