Thanks for adding the VideoProcessor API, very handy and convenient to have.
I added blurring/replacing the video background into my app through this API and everything seem to be working just fine 馃憤 tweet for the curious
OffscreenCanvas Limitation ?Why does the API expose OffscreenCanvas instead of HTMLCanvasElement?
I didn't find in the docs an explanation of this design decision. Also, examples that you've implemented don't seem to be doing anything with OffscreenCanvas that isn't possible with HTMLCanvasElement
I understand this is the reason why this API only works in Chrome. Not very clear though why limiting the API by using OffscreenCanvas. Would love to understand a bigger picture 馃檹
As I understand OffscreenCanvas that the API provides was meant to be transferred into a web-worker to offload the work from a separate thread. But the API provides already _locked_ OffscreenCanvas so it isn't possible to transfer it as is.
In my case, I ended up calling inputFrame.transferToImageBitmap() and transferring an ImageBitmap into a web-worker that was then drawn into the worker's own OffscreenCanvas instance. Then I did the same for transferring resulting in ImageBitmap back.
This seems to be suboptimal, and I could do the same if the API would provide HTMLCanvasElement and probably would make it work in other browsers than chrome.
OffscreenCanvas in the API instead of HTMLCanvasElementoutputFrame: HTMLCanvasElement where a new image should be drawn to. The API will provide it without locking it. Then consumer could call .transferControlToOffscreen() and pass that connected OffscreenCanvas instance to the web-worker. OffscreenCanvas/ web-workers / tensorflow / etcOffscreenCanvas and what is a longer-term roadmap for this API Please let me know if I am missing something here and didn't get how ideally OffscreenCanvas should be leveraged.
Hi @Dosant ,
Thanks for trying out the VideoProcessor API. Regarding your question about OffscreenCanvas, this API is in pilot/alpha phase, so we are open to changing the API based on customer feedback. The reasons we went with OffscreenCanvas were:
Thanks,
Manjesh
Hi @Dosant ,
Regarding your statement:
In my case, I ended up calling
inputFrame.transferToImageBitmap()and transferring anImageBitmapinto a web-worker that was then drawn into the worker's ownOffscreenCanvasinstance. Then I did the same for transferring resulting inImageBitmapback.
This seems to be suboptimal, and I could do the same if the API would provide
HTMLCanvasElementand probably would make it work in other browsers than chrome.
Even if inputFrame was a HTMLCanvasElement, you would still not be able to transfer control to the web worker because the main thread (SDK) would have created a rendering context in order to paint it with the current input frame from the video track. So, you would still have to pass a bitmap to the web worker. I think passing the bitmap is better than passing the raw image data from getImageData(). So I think you are doing the right thing there.
Thank you for the detailed feedback. This is very useful for us in terms of calibrating the API in the near future.
Thanks,
Manjesh
Even if inputFrame was a HTMLCanvasElement, you would still not be able to transfer control to the web worker because the main thread (SDK) would have created a rendering context in order to paint it with the current input frame from the video track. So, you would still have to pass a bitmap to the web worker. I think passing the bitmap is better than passing the raw image data from getImageData(). So I think you are doing the right thing there.
Right, I agree that I'd anyway have to pass the inputFrameto a worker somehow
The API still could have exposed HTMLCanvasElement and then the consumer code could draw it to their own OffscreenCanvas and then pass it as ImageBitmap. In this case, consumer code could also fallback to a less performant main thread version using HTMLCanvasElement.
For the outputFrame would be very interesting to check if transferControlToOffscreen approach would perform better than passing ImageBitmap. Please note, I didn't compare performance.
We wanted to focus our initial support for this API on single-threaded applications. I understand you are trying to use web workers, but this alpha version satisfies a lot of the existing demand for video processing.
Just a note. It is working in a web-worker 馃コ
Just some minor workarounds and figuring out how to make it work (no example)
@Dosant ,
Thanks for the clarification. We will add a QuickStart example that demonstrates web workers soon.
Thanks,
Manjesh
My suggestion for this API would be to simply provide access to the mediaStream object and let the implementer determine what to do with it.
// mediaStream -> processor -> mediaStream
function processor(mediaStream) {
return mediaStream;
}
We are essentially doing this using a getUserMedia hack where we provide our own method for getUserMedia in the media track constraints object so that we can pipe the camera media stream track into tensorflow, do some image manipulation on canvas, and then return the canvas's media stream track:
createLocalVideoTrack({
async getUserMedia(constraints) {
const cameraMediaStream = await navigator.mediaDevices.getUserMedia(constraints);
const canvasMediaStream = getCanvasFromTensorFlowManipulation(cameraMediaStream);
return canvasMediaStream;
}
}
@manjeshbhargav Can we expect a example or documentation for what @Dosant has implemented using Video Processor API ?
I'm trying to implement same with https://github.com/twilio/twilio-video-app-react/
also I've raised the question here https://github.com/twilio/twilio-video-app-react/issues/453
Thanks.
@SanjayBikhchandani ,
Our examples focus on demonstrating the use of the SDK APIs, so we typically tend to keep our examples simple so that developers don't have to read through a lot of code to get to the API usage. However, you can use the VideoProcessor APIs in conjunction with libraries such as bodyPix in order to achieve background substitution/replacement.
Thanks,
Manjesh
@markbrouch ,
My suggestion for this API would be to simply provide access to the
mediaStreamobject and let the implementer determine what to do with it.// mediaStream -> processor -> mediaStream function processor(mediaStream) { return mediaStream; }We are essentially doing this using a
getUserMediahack where we provide our own method forgetUserMediain the media track constraints object so that we can pipe the camera media stream track into tensorflow, do some image manipulation on canvas, and then return the canvas's media stream track:createLocalVideoTrack({ async getUserMedia(constraints) { const cameraMediaStream = await navigator.mediaDevices.getUserMedia(constraints); const canvasMediaStream = getCanvasFromTensorFlowManipulation(cameraMediaStream); return canvasMediaStream; } }We don't use MediaStreams in our SDK, we only operate on MediaStreamTracks. Since Tensorflow models operate on individual frames, you can use the existing
processFrame(inputCanvas)method to pass the contents of the canvas to the Tensorflow model. The approach you have suggested will not work for us since we need to update the LocalVideoTrack's attached elements with the processed feed and also update the corresponding MediaStreamTracks being published to the Room.
Thanks,
Manjesh
@markbrouch ,
My suggestion for this API would be to simply provide access to the
mediaStreamobject and let the implementer determine what to do with it.// mediaStream -> processor -> mediaStream function processor(mediaStream) { return mediaStream; }We are essentially doing this using a
getUserMediahack where we provide our own method forgetUserMediain the media track constraints object so that we can pipe the camera media stream track into tensorflow, do some image manipulation on canvas, and then return the canvas's media stream track:createLocalVideoTrack({ async getUserMedia(constraints) { const cameraMediaStream = await navigator.mediaDevices.getUserMedia(constraints); const canvasMediaStream = getCanvasFromTensorFlowManipulation(cameraMediaStream); return canvasMediaStream; } }We don't use MediaStreams in our SDK, we only operate on MediaStreamTracks. Since Tensorflow models operate on individual frames, you can use the existing
processFrame(inputCanvas)method to pass the contents of the canvas to the Tensorflow model. The approach you have suggested will not work for us since we need to update the LocalVideoTrack's attached elements with the processed feed and also update the corresponding MediaStreamTracks being published to the Room.Thanks,
Manjesh
Thanks @manjeshbhargav,
Substitute mediaStreamTrack for mediaStream in my example and the main point remains. I think the main problem with processFrame as it currently exists is that it makes use of offscreenCanvas, which has poor browser support currently. By being less prescriptive with the processFrame API and allowing the application to directly handle the mediaStreamTrack, we wouldn't have that restriction. In our solution we are piping the mediaStreamTrack through TF and performing our own canvas transformations using a normal canvas, which allows us to support Safari and Firefox in addition to Chrome.
@markbrouch ,
Right now, we are limiting our support to Chrome because we are in the pilot/beta phase and we need to fine-tune our implementation to make it more performant. We do intend to support all browsers by the time we go to GA (sometime in Q2).
The reason why we designed the VideoProcessor API this way is to allow the developers to focus only on implementing the logic to process frames and not have to worry about updating the preview elements and the published track (the SDK does all that for you). Also, if you want to pipe your own MediaStreamTrack, you can achieve that easily without the VideoProcessor API like so:
const { LocalVideoTrack } = require('twilio-video');
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const videoTrack = stream.getTracks()[0];
const processedVideoTrack = processVideoTrack(videoTrack);
const twilioVideoTrack = new LocalVideoTrack(processedVideoTrack);
Thanks,
Manjesh
Thanks @manjeshbhargav , I'm excited to use this feature when it gains broader browser support!
Just to add to the thread:
I implemented something similar to @Dosant using BodyPix:
processFrame converted the frame to ImageBitmapIssue is that I couldn't get any more than 10-12 FPS on average (@Dosant where you able to get anything better) using an average machine (whether using a worker or not).
This is a great repo I stumbled upon doing some research.
First of all @w-okada "worker-ized" a lot of common video processing libraries 馃憦 馃憦 馃憦 - including BodyBix. Running his demo for BodyPix I get slightly better FPS (don't understand why - need to dive into it) but more importantly he also provided a worker for Google Meet TFLite model which is much faster (20-25FPS on same machine) and precise. Haven't tried it out with Twilio Video yet.
@manjeshbhargav, as to the processor API, my 2 cents:
@shaibt
Thanks for introducing my repos.
Yes, I have Google Meet Model, but the model is currently not under APACHE-2.0 license.
I implemented something similar to @Dosant using BodyPix:
@shaibt Is it possible you could post a code snippet of your VideoProcessor that uses BodyPix please?
Is there a code sample which uses BodyPix with VideoProcessor ? Cant find any way to send <video> element to bodyPix and set the output canvas as localStream for the remotePeerConnection
Hi @adityajoshee ,
You can write the contents of the OffScreenCanvas input frame that you get in the processFrame() callback into an HTMLCanvasElement, and then pass it to BodyPix's segmentPerson() method. Let me know if it works for you.
Thanks,
Manjesh
Basically I'm trying to add background blur using BodyPix and add that as a video track to local participant in the joinRoom function, like this -
``` setTimeout(() => { loadBodyPix(document.getElementById('localVideo')) }, 3000);
let localCanvas = document.getElementById('localCanvas');
let localStream = localCanvas.captureStream(10)
const track = new Twilio.Video.LocalVideoTrack(localStream.getVideoTracks()[0]);
console.log('....*......')
await room.localParticipant.publishTrack(track, {
name: 'canvasStream',
priority: 'low',
} );
But I get
TypeError: track must be a LocalAudioTrack, LocalVideoTrack, LocalDataTrack, or MediaStreamTrack
```
Update: The original APACHE-2.0 license for the Google Meets segmentation model was found. I would also like to mention that it is used by Jitsi, so perhaps you can take a look at their code. On @w-okada 's example, I am able to achieve 100 fps on desktop with the 256x256 model. 256x256 process size, and SIMD. The models themselves can be found here.
@manjeshbhargav it will be very helpful if you can share how to use the videoprocessor for background blur with actual code using bodypix or any other lib for that matter.
For background blur, I wrote a React hook for this (not using VideoProcessor API for wider browser support).
Offscreen Canvas has almost the same API, so we can do it in a similar way
https://gist.github.com/acro5piano/6f16fa332416479b9edadccc71b4bc25
Most helpful comment
@shaibt
Thanks for introducing my repos.
Yes, I have Google Meet Model, but the model is currently not under APACHE-2.0 license.
please see
https://github.com/tensorflow/tfjs/issues/4177