Twilio-video.js: [VideoProcessor API] OffscreenCanvas limitation and API suggestions?

Created on 10 Mar 2021 · 20Comments · Source: twilio/twilio-video.js

Thanks for adding the VideoProcessor API, very handy and convenient to have.
I added blurring/replacing the video background into my app through this API and everything seem to be working just fine 👍 tweet for the curious

`OffscreenCanvas` Limitation ?

Why does the API expose OffscreenCanvas instead of HTMLCanvasElement?

I didn't find in the docs an explanation of this design decision. Also, examples that you've implemented don't seem to be doing anything with OffscreenCanvas that isn't possible with HTMLCanvasElement

I understand this is the reason why this API only works in Chrome. Not very clear though why limiting the API by using OffscreenCanvas. Would love to understand a bigger picture 🙏

My implementation on top of the current API

As I understand OffscreenCanvas that the API provides was meant to be transferred into a web-worker to offload the work from a separate thread. But the API provides already _locked_ OffscreenCanvas so it isn't possible to transfer it as is.

In my case, I ended up calling inputFrame.transferToImageBitmap() and transferring an ImageBitmap into a web-worker that was then drawn into the worker's own OffscreenCanvas instance. Then I did the same for transferring resulting in ImageBitmap back.

This seems to be suboptimal, and I could do the same if the API would provide HTMLCanvasElement and probably would make it work in other browsers than chrome.

Suggestions to the API

It seems there is no reason (at least for now) to provide OffscreenCanvas in the API instead of HTMLCanvasElement
I wonder if the API should provide blank outputFrame: HTMLCanvasElement where a new image should be drawn to. The API will provide it without locking it. Then consumer could call .transferControlToOffscreen() and pass that connected OffscreenCanvas instance to the web-worker.
Would be great if examples went further than simple CSS filters. But actually integrated OffscreenCanvas/ web-workers / tensorflow / etc
Would be great to an example in the docs why did you go with OffscreenCanvas and what is a longer-term roadmap for this API

Please let me know if I am missing something here and didn't get how ideally OffscreenCanvas should be leveraged.

Chrome

Source

Dosant

Most helpful comment

@shaibt
Thanks for introducing my repos.
Yes, I have Google Meet Model, but the model is currently not under APACHE-2.0 license.

please see
https://github.com/tensorflow/tfjs/issues/4177

w-okada on 13 Apr 2021

👍2

All 20 comments

Hi @Dosant ,

Thanks for trying out the VideoProcessor API. Regarding your question about OffscreenCanvas, this API is in pilot/alpha phase, so we are open to changing the API based on customer feedback. The reasons we went with OffscreenCanvas were:

We wanted to support Chrome only initially because most of our initial customer interest has been for Chrome. Also, we wanted to focus all our resources (engineering, QA) on getting this API working properly and performant on Chrome. We also did not want to field questions regarding performance issues and problems on other browsers at this point, since this API is in a very early stage.
We wanted to focus our initial support for this API on single-threaded applications. I understand you are trying to use web workers, but this alpha version satisfies a lot of the existing demand for video processing. We will definitely take your feedback into account going forward so that web workers are supported by this API before GA.

Thanks,

Manjesh

manjeshbhargav on 10 Mar 2021

Hi @Dosant ,

Regarding your statement:

In my case, I ended up calling inputFrame.transferToImageBitmap() and transferring an ImageBitmap into a web-worker that was then drawn into the worker's own OffscreenCanvas instance. Then I did the same for transferring resulting in ImageBitmap back.

This seems to be suboptimal, and I could do the same if the API would provide HTMLCanvasElement and probably would make it work in other browsers than chrome.

Even if inputFrame was a HTMLCanvasElement, you would still not be able to transfer control to the web worker because the main thread (SDK) would have created a rendering context in order to paint it with the current input frame from the video track. So, you would still have to pass a bitmap to the web worker. I think passing the bitmap is better than passing the raw image data from getImageData(). So I think you are doing the right thing there.

Thank you for the detailed feedback. This is very useful for us in terms of calibrating the API in the near future.

Thanks,

Manjesh

manjeshbhargav on 10 Mar 2021

Even if inputFrame was a HTMLCanvasElement, you would still not be able to transfer control to the web worker because the main thread (SDK) would have created a rendering context in order to paint it with the current input frame from the video track. So, you would still have to pass a bitmap to the web worker. I think passing the bitmap is better than passing the raw image data from getImageData(). So I think you are doing the right thing there.

Right, I agree that I'd anyway have to pass the inputFrameto a worker somehow
The API still could have exposed HTMLCanvasElement and then the consumer code could draw it to their own OffscreenCanvas and then pass it as ImageBitmap. In this case, consumer code could also fallback to a less performant main thread version using HTMLCanvasElement.

For the outputFrame would be very interesting to check if transferControlToOffscreen approach would perform better than passing ImageBitmap. Please note, I didn't compare performance.

We wanted to focus our initial support for this API on single-threaded applications. I understand you are trying to use web workers, but this alpha version satisfies a lot of the existing demand for video processing.

Just a note. It is working in a web-worker 🥳
Just some minor workarounds and figuring out how to make it work (no example)

Dosant on 10 Mar 2021

@Dosant ,

Thanks for the clarification. We will add a QuickStart example that demonstrates web workers soon.

Thanks,

Manjesh

manjeshbhargav on 10 Mar 2021

👍2

My suggestion for this API would be to simply provide access to the mediaStream object and let the implementer determine what to do with it.

// mediaStream -> processor -> mediaStream
function processor(mediaStream) {
  return mediaStream;
}

We are essentially doing this using a getUserMedia hack where we provide our own method for getUserMedia in the media track constraints object so that we can pipe the camera media stream track into tensorflow, do some image manipulation on canvas, and then return the canvas's media stream track:

createLocalVideoTrack({
  async getUserMedia(constraints) {
    const cameraMediaStream = await navigator.mediaDevices.getUserMedia(constraints);

    const canvasMediaStream = getCanvasFromTensorFlowManipulation(cameraMediaStream);

    return canvasMediaStream;
  }
}

markbrouch on 12 Mar 2021

❤1

@manjeshbhargav Can we expect a example or documentation for what @Dosant has implemented using Video Processor API ?

I'm trying to implement same with https://github.com/twilio/twilio-video-app-react/
also I've raised the question here https://github.com/twilio/twilio-video-app-react/issues/453

Thanks.

SanjayBikhchandani on 16 Mar 2021

👍2

@SanjayBikhchandani ,

Our examples focus on demonstrating the use of the SDK APIs, so we typically tend to keep our examples simple so that developers don't have to read through a lot of code to get to the API usage. However, you can use the VideoProcessor APIs in conjunction with libraries such as bodyPix in order to achieve background substitution/replacement.

Thanks,

Manjesh

manjeshbhargav on 16 Mar 2021

@markbrouch ,

My suggestion for this API would be to simply provide access to the mediaStream object and let the implementer determine what to do with it.
// mediaStream -> processor -> mediaStream
function processor(mediaStream) {
  return mediaStream;
}
We are essentially doing this using a getUserMedia hack where we provide our own method for getUserMedia in the media track constraints object so that we can pipe the camera media stream track into tensorflow, do some image manipulation on canvas, and then return the canvas's media stream track:
createLocalVideoTrack({
  async getUserMedia(constraints) {
    const cameraMediaStream = await navigator.mediaDevices.getUserMedia(constraints);

    const canvasMediaStream = getCanvasFromTensorFlowManipulation(cameraMediaStream);

    return canvasMediaStream;
  }
}
We don't use MediaStreams in our SDK, we only operate on MediaStreamTracks. Since Tensorflow models operate on individual frames, you can use the existing processFrame(inputCanvas) method to pass the contents of the canvas to the Tensorflow model. The approach you have suggested will not work for us since we need to update the LocalVideoTrack's attached elements with the processed feed and also update the corresponding MediaStreamTracks being published to the Room.

Thanks,

Manjesh

manjeshbhargav on 19 Mar 2021

@markbrouch ,
My suggestion for this API would be to simply provide access to the mediaStream object and let the implementer determine what to do with it.
// mediaStream -> processor -> mediaStream
function processor(mediaStream) {
  return mediaStream;
}
We are essentially doing this using a getUserMedia hack where we provide our own method for getUserMedia in the media track constraints object so that we can pipe the camera media stream track into tensorflow, do some image manipulation on canvas, and then return the canvas's media stream track:
createLocalVideoTrack({
  async getUserMedia(constraints) {
    const cameraMediaStream = await navigator.mediaDevices.getUserMedia(constraints);

    const canvasMediaStream = getCanvasFromTensorFlowManipulation(cameraMediaStream);

    return canvasMediaStream;
  }
}
We don't use MediaStreams in our SDK, we only operate on MediaStreamTracks. Since Tensorflow models operate on individual frames, you can use the existing processFrame(inputCanvas) method to pass the contents of the canvas to the Tensorflow model. The approach you have suggested will not work for us since we need to update the LocalVideoTrack's attached elements with the processed feed and also update the corresponding MediaStreamTracks being published to the Room.

Thanks,

Manjesh

Thanks @manjeshbhargav,

Substitute mediaStreamTrack for mediaStream in my example and the main point remains. I think the main problem with processFrame as it currently exists is that it makes use of offscreenCanvas, which has poor browser support currently. By being less prescriptive with the processFrame API and allowing the application to directly handle the mediaStreamTrack, we wouldn't have that restriction. In our solution we are piping the mediaStreamTrack through TF and performing our own canvas transformations using a normal canvas, which allows us to support Safari and Firefox in addition to Chrome.

markbrouch on 19 Mar 2021

@markbrouch ,

Right now, we are limiting our support to Chrome because we are in the pilot/beta phase and we need to fine-tune our implementation to make it more performant. We do intend to support all browsers by the time we go to GA (sometime in Q2).
The reason why we designed the VideoProcessor API this way is to allow the developers to focus only on implementing the logic to process frames and not have to worry about updating the preview elements and the published track (the SDK does all that for you). Also, if you want to pipe your own MediaStreamTrack, you can achieve that easily without the VideoProcessor API like so:

const { LocalVideoTrack } = require('twilio-video');

const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const videoTrack = stream.getTracks()[0];
const processedVideoTrack = processVideoTrack(videoTrack);
const twilioVideoTrack = new LocalVideoTrack(processedVideoTrack);

Thanks,

Manjesh

manjeshbhargav on 23 Mar 2021

👍2

Thanks @manjeshbhargav , I'm excited to use this feature when it gains broader browser support!

markbrouch on 24 Mar 2021

🚀1

Just to add to the thread:
I implemented something similar to @Dosant using BodyPix:

in the video processor processFrame converted the frame to ImageBitmap
sent to worker for segmentation
here, I think we differ a little, worker sends prediction data back to main thread (and not image data).
main threads composes output frame

Issue is that I couldn't get any more than 10-12 FPS on average (@Dosant where you able to get anything better) using an average machine (whether using a worker or not).

This is a great repo I stumbled upon doing some research.
First of all @w-okada "worker-ized" a lot of common video processing libraries 👏 👏 👏 - including BodyBix. Running his demo for BodyPix I get slightly better FPS (don't understand why - need to dive into it) but more importantly he also provided a worker for Google Meet TFLite model which is much faster (20-25FPS on same machine) and precise. Haven't tried it out with Twilio Video yet.

@manjeshbhargav, as to the processor API, my 2 cents:

Don't mind receiving the input as a canvas but obviously OfflineCanvas will not work in Safari (where most of my customers are). Looking forward to an improvement here.
Do like the API to be frame based and not stream based. In any case I'd break down the stream to frame-by-frame for processing.

shaibt on 12 Apr 2021

👍2

@shaibt
Thanks for introducing my repos.
Yes, I have Google Meet Model, but the model is currently not under APACHE-2.0 license.

please see
https://github.com/tensorflow/tfjs/issues/4177

w-okada on 13 Apr 2021

👍2

I implemented something similar to @Dosant using BodyPix:

@shaibt Is it possible you could post a code snippet of your VideoProcessor that uses BodyPix please?

RyanDurkin on 15 Apr 2021

Is there a code sample which uses BodyPix with VideoProcessor ? Cant find any way to send <video> element to bodyPix and set the output canvas as localStream for the remotePeerConnection

adityajoshee on 10 May 2021

Hi @adityajoshee ,

You can write the contents of the OffScreenCanvas input frame that you get in the processFrame() callback into an HTMLCanvasElement, and then pass it to BodyPix's segmentPerson() method. Let me know if it works for you.

Thanks,

Manjesh

manjeshbhargav on 11 May 2021

Basically I'm trying to add background blur using BodyPix and add that as a video track to local participant in the joinRoom function, like this -

``` setTimeout(() => { loadBodyPix(document.getElementById('localVideo')) }, 3000);
let localCanvas = document.getElementById('localCanvas');
let localStream = localCanvas.captureStream(10)
const track = new Twilio.Video.LocalVideoTrack(localStream.getVideoTracks()[0]);
console.log('....*......')
await room.localParticipant.publishTrack(track, {
name: 'canvasStream',
priority: 'low',
} );

But I get

TypeError: track must be a LocalAudioTrack, LocalVideoTrack, LocalDataTrack, or MediaStreamTrack
```

adityajoshee on 11 May 2021

Update: The original APACHE-2.0 license for the Google Meets segmentation model was found. I would also like to mention that it is used by Jitsi, so perhaps you can take a look at their code. On @w-okada 's example, I am able to achieve 100 fps on desktop with the 256x256 model. 256x256 process size, and SIMD. The models themselves can be found here.

Here is an article on it by @w-okada.

kirawi on 14 May 2021

@manjeshbhargav it will be very helpful if you can share how to use the videoprocessor for background blur with actual code using bodypix or any other lib for that matter.

aditya-protonn on 20 May 2021

For background blur, I wrote a React hook for this (not using VideoProcessor API for wider browser support).
Offscreen Canvas has almost the same API, so we can do it in a similar way

https://gist.github.com/acro5piano/6f16fa332416479b9edadccc71b4bc25