Linux: v4l2 codec does not support V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME

Created on 21 Aug 2019 · 39Comments · Source: raspberrypi/linux

Describe the bug
Using bcm2835-codec there is no way to force key frame generation

To reproduce
While using v4l2 encoder, send ioctl with V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME

Expected behaviour
Expect ioctl to succeed and i-frame to be generated.

Actual behaviour
ioctl fails

System

Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW
Pi3B+

Which OS and version (cat /etc/rpi-issue)?
Raspberry Pi reference 2019-07-10
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 175dfb027ffabd4b8d5080097af0e51ed9a4a56c, stage2
Which firmware version (vcgencmd version)?
Jul 9 2019 14:40:53
Copyright (c) 2012 Broadcom
version 6c3fe3f096a93de3b34252ad98cdccadeb534be2 (clean) (release) (start)
Which kernel version (uname -a)?
Linux tmm1-pi 4.19.57-v7+ #1244 SMP Thu Jul 4 18:45:25 BST 2019 armv7l GNU/Linux

Additional context
Similar functionality is available via the OMX encoder using either OMX_IndexConfigBrcmVideoRequestIFrame or OMX_IndexConfigVideoIntraVOPRefresh

bcm2835_codec_s_ctrl() needs to have a V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME case statement added. See for example https://lkml.org/lkml/2016/1/19/24

cc @6by9

Source

tmm1

All 39 comments

MMAL_PARAMETER_VIDEO_REQUEST_I_FRAME is the MMAL parameter.
Pull Requests welcome - it won't be a high priority for me to fix.

I'd expect the actual active bit to be something like:

        u32 mmal_bool = 1;
        ret = vchiq_mmal_port_parameter_set(ctx->dev->instance,
                            &ctx->component->output[0],
                            MMAL_PARAMETER_VIDEO_REQUEST_I_FRAME,
                            &mmal_bool,
                            sizeof(mmal_bool));

Have you actually got some software which uses this V4L2 parameter? It's one of those annoying things that for this sort of patch it'll take longer to produce a test app to verify that it works than it does to make the change.

6by9 on 22 Aug 2019

Great, that's all I needed. I will open a PR. I am using ffmpeg's h264_v4l2m2m encoder to test and that's how I discovered this ioctl was failing.

diff --git a/libavcodec/v4l2_m2m_enc.c b/libavcodec/v4l2_m2m_enc.c
index 636e1a96dd..2d5f388cc1 100644
--- a/libavcodec/v4l2_m2m_enc.c
+++ b/libavcodec/v4l2_m2m_enc.c
@@ -245,6 +245,9 @@ static int v4l2_send_frame(AVCodecContext *avctx, const AVFrame *frame)
     V4L2m2mContext *s = ((V4L2m2mPriv*)avctx->priv_data)->context;
     V4L2Context *const output = &s->output;

+    if (frame->pict_type == AV_PICTURE_TYPE_I)
+        v4l2_set_ext_ctrl(s, MPEG_CID(FORCE_KEY_FRAME), 0, "force key frame");
+
     return ff_v4l2_context_enqueue_frame(output, frame);
 }

tmm1 on 22 Aug 2019

Another issue I ran into is that the driver doesn't seem to advertise MPLANE support. This check in ffmpeg fails: https://github.com/FFmpeg/FFmpeg/blob/9bcb1cb6ed50e66e0489beb871eed83533b3de97/libavcodec/v4l2_m2m.c#L53

And so ffmpeg only ends up sending the Y-plane from a YUV420P picture.

Should I look into fixing the capabilities advertised, or is the correct fix to teach ffmpeg to pack the planes together before sending them into the driver?

tmm1 on 22 Aug 2019

Is that on the latest kernel tree?
The original version of the driver supported the single planar API, but that was replaced by #3097 to use the multiplanar API (although still the single planar formats).
https://github.com/raspberrypi/linux/blob/rpi-4.19.y/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L2607 seems to say multi planar, as does

pi@raspberrypi:~ $ v4l2-ctl -D -d 11
Driver Info:
    Driver name      : bcm2835-codec
    Card type        : bcm2835-codec
    Bus info         : platform:bcm2835-codec
    Driver version   : 4.19.64
    Capabilities     : 0x84204000
        Video Memory-to-Memory Multiplanar
        Streaming
        Extended Pix Format
        Device Capabilities
    Device Caps      : 0x04204000
        Video Memory-to-Memory Multiplanar
        Streaming
        Extended Pix Format

There is an ffmpeg patch that should be in Raspbian to add support for NV21 and YUV420P - the standard code only had NV12 support. That causes some grief with the chroma planes not being setup
Off my local branch:

commit 4ced57196354736ea2f831fd012be69f7a01dc30
Author: Dave Stevenson <[email protected]>
Date:   Thu Jul 4 17:50:13 2019 +0100

    avcodec: v4l2_m2m: Support NV21 and YUV420P

    The single planar support was for NV21 only.
    Add NV21 and YUV420P support.

diff --git a/libavcodec/v4l2_buffers.c b/libavcodec/v4l2_buffers.c
index aef911f3bb..1934ff9377 100644
--- a/libavcodec/v4l2_buffers.c
+++ b/libavcodec/v4l2_buffers.c
@@ -321,11 +321,21 @@ int ff_v4l2_buffer_buf_to_avframe(AVFrame *frame, V4L2Buffer *avbuf)
     /* 1.1 fixup special cases */
     switch (avbuf->context->av_pix_fmt) {
     case AV_PIX_FMT_NV12:
+    case AV_PIX_FMT_NV21:
         if (avbuf->num_planes > 1)
             break;
         frame->linesize[1] = avbuf->plane_info[0].bytesperline;
         frame->data[1] = frame->buf[0]->data + avbuf->plane_info[0].bytesperline * avbuf->context->format.fmt.pix_mp.height;
         break;
+    case AV_PIX_FMT_YUV420P:
+    /* No YV12? support? */
+        if (avbuf->num_planes > 1)
+            break;
+        frame->linesize[1] = avbuf->plane_info[0].bytesperline >> 1;
+        frame->linesize[2] = avbuf->plane_info[0].bytesperline >> 1;
+        frame->data[1] = frame->buf[0]->data + avbuf->plane_info[0].bytesperline * avbuf->context->format.fmt.pix_mp.height;
+        frame->data[2] = frame->data[1] + ((avbuf->plane_info[0].bytesperline * avbuf->context->format.fmt.pix_mp.height) >> 2);
+        break;
     default:
         break;
     }

6by9 on 22 Aug 2019

👍1

Oh, hang on, you're looking at encode, not decode.

V4L2 dictates how the formats should be packed or the single planar formats.
It does look like ff_v4l2_buffer_avframe_to_buf assumes that if the avframe is multiple planes then it doesn't check the pixel format nor the correct number of planes, ie the reverse conversion to that done in ff_v4l2_buffer_buf_to_avframe
https://github.com/FFmpeg/FFmpeg/blob/9bcb1cb6ed50e66e0489beb871eed83533b3de97/libavcodec/v4l2_buffers.c#L289

As a quick test, try reverting the patches I referenced from #3097 to see if it plays nicely with the single planar API. I have a suspicion that it won't, and yes FFmpeg should be munging the buffer pointers/sizes, having first confirmed that the planes are contiguous.

6by9 on 22 Aug 2019

👍1

replaced by #3097 to use the multiplanar API (although still the single planar formats)

Wonderful, thanks for the link. I am on the stable release and v4l2-ctl wasn't showing mplane support.

I will build tip and try again.

ffmpeg patch that should be in Raspbian to add support for NV21 and YUV420P - the standard code only had NV12 support

I did try this patch, but it only affects AVFrame generation (i.e decoding). I'm working with the encoder which doesn't use this code path.

Would you mind dropping the comment and sending a copy of this to ffmpeg-devel? I will merge it into ffmpeg master.

tmm1 on 22 Aug 2019

sudo rpi-update will grab the latest kernel with this change to save you rebuilding your own kernel.

I'm not sure where/when you were hitting the issue with not supporting MPLANE. On failure v4l2_prepare_contexts should have then checked for single plane and been happy. It will still go wrong as there is the missing mapping for single planar formats, but fundamentally it is correct.
https://github.com/FFmpeg/FFmpeg/blob/9bcb1cb6ed50e66e0489beb871eed83533b3de97/libavcodec/v4l2_m2m.c#L81

The question over single planar vs multi planar, and the use of single planar formats through the multi planar API, was brought up in discussing #3097 and had been discussed on linux-media
https://www.spinics.net/lists/linux-media/msg154053.html
Cedrus for one also uses the single planar API, therefore having support for the single planar formats in FFmpeg would be sensible.

I'll try to sort out the decode YUV420/NV21 patch at some point.
Sorry, but we've found ffmpeg-devel not the most welcoming or helpful of places.
The bugtracker also seems to be a dead end. I opened a ticket 7 months ago for an omx issue, with a proposed patch to fix it, and got no response beyond "isn't this a duplicate" (it wasn't) https://trac.ffmpeg.org/ticket/7687.
I guess it depends on the relevant people seeing messages.

6by9 on 22 Aug 2019

sudo rpi-update will grab the latest kernel with this change to save you rebuilding your own kernel.

Thanks that's very helpful.

I'm not sure where/when you were hitting the issue with not supporting MPLANE.

Sorry I should have been more specific. There were no errors- it worked in single plane mode, but since only the Y-plane was sent to the encoder the generate picture was not color correct. I thought maybe if the driver supported mplane then the picture would get uploaded correctly. I will try that after rpi-update completes.

I'll try to sort out the decode YUV420/NV21 patch at some point.
Sorry, but we've found ffmpeg-devel not the most welcoming or helpful of places.

Your experience is not uncommon.

I opened a ticket 7 months ago for an omx issue, with a proposed patch to fix it, and got no response beyond "isn't this a duplicate" (it wasn't) https://trac.ffmpeg.org/ticket/7687.

It seems the omx driver is pretty much unmaintained. I will be happy to add myself as the maintainer of omx.c and merge any relevant patches to improve RPi support.

tmm1 on 22 Aug 2019

After rpi-update, I can confirm v4l2-ctl shows "Multiplanar" support.

The behavior on ffmpeg's side is still the same- the images generated by the encoder are missing chroma. I will work on a patch to convert to V4L's requested pixel format. How does V4L signal that it wants packed planes?

tmm1 on 22 Aug 2019

How does V4L signal that it wants packed planes?

https://www.linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/yuv-formats.html
V4L2_PIX_FMT_YUV420 vs V4L2_PIX_FMT_YUV420M
V4L2_PIX_FMT_NV12 vs V4L2_PIX_FMT_NV12M
etc
The device will enumerate the formats that it supports via ioctl(VIDIOC_ENUM_FMT), and allow selection via VIDIOC_S_FMT. If the format passed into S_FMT isn't supported then the driver will select some form of default (it does NOT fail the S_FMT call).
https://www.linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/vidioc-enum-fmt.html#vidioc-enum-fmt

Thanks for looking at my issue in the bug tracker. I can fully relate to things dropping to being unmaintained as people get bored, but I had hoped that it might have had some traction seeing as I had provided a fix for it.

6by9 on 22 Aug 2019

Thanks. I'm new to v4l2 so I appreciate the pointers.

BTW, one thing I'm interested in adding to ffmpeg is accelerated scaling on the RPi.

It looks like using "OMX.broadcom.resize" from avcodec/omx.c would be fairly straightfoward (although ffmpeg-devel would probably prefer if I wrote a separate avfilter/vf_scale_omx.c instead).

But I'm curious what you think about adding the resize component to the v4l2 driver? (I saw /dev/video12 exposes ISP already, but it's unclear to me if I can use the second downscaling output port reasonably from v4l2).

tmm1 on 22 Aug 2019

How does V4L signal that it wants packed planes?

https://www.linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/yuv-formats.html
V4L2_PIX_FMT_YUV420 vs V4L2_PIX_FMT_YUV420M

I was able to teach ff_v4l2_buffer_avframe_to_buf about contiguous planar formats. However the each chroma in the encoded picture is offset. I assume there is some sort of padding requirement?

tmm1 on 22 Aug 2019

I was able to teach ff_v4l2_buffer_avframe_to_buf about contiguous planar formats. However the each chroma in the encoded picture is offset. I assume there is some sort of padding requirement?

ffmpeg's AVFrame had some padding, but it didn't match the padding expected by v4l2. I got this working and have submitted a patchset to ffmpeg-devel: http://ffmpeg.org/pipermail/ffmpeg-devel/2019-August/248775.html

The change to ff_v4l2_buffer_avframe_to_buf specifically is in https://github.com/tmm1/FFmpeg/commit/55380ac7bd20fc764b14d282f2f3899b44059c13

I also included your NV21/YUV420P decoding patch in my set.

tmm1 on 22 Aug 2019

I'm getting random failures starting up the h264 encoder sometimes:

[h264_v4l2m2m @ 0x28bda20] Using device /dev/video11
[h264_v4l2m2m @ 0x28bda20] driver 'bcm2835-codec' on card 'bcm2835-codec' in mplane mode
[h264_v4l2m2m @ 0x28bda20] requesting formats: output=YU12 capture=H264
[h264_v4l2m2m @ 0x28bda20] output VIDIOC_REQBUFS failed: Cannot allocate memory
[h264_v4l2m2m @ 0x28bda20] no v4l2 output context's buffers

dmesg shows:

[12205.119574] cma: cma_alloc: alloc failed, req-size: 765 pages, ret: -12
[12205.119590] bcm2835-codec bcm2835-codec: dma_alloc_coherent of size 3133440 failed
[12205.121366] cma: cma_alloc: alloc failed, req-size: 192 pages, ret: -12
[12205.121375] bcm2835-codec bcm2835-codec: dma_alloc_coherent of size 786432 failed

tmm1 on 23 Aug 2019

It looks like using "OMX.broadcom.resize" from avcodec/omx.c would be fairly straightfoward (although ffmpeg-devel would probably prefer if I wrote a separate avfilter/vf_scale_omx.c instead).

But I'm curious what you think about adding the resize component to the v4l2 driver? (I saw /dev/video12 exposes ISP already, but it's unclear to me if I can use the second downscaling output port reasonably from v4l2).

/dev/video12 will happily rescale images for you using the ISP hardware block. GStreamer will happily use it via the v4l2convert component.
OMX.broadcom.resize is done on the VPU (generic SIMD processor within the GPU). More power required, and slower. I see little point in exposing that via V4L2.

Also note that the V4L2 M2M devices are set up to support dmabuf import and export, therefore can do zero copy paths around the place. This is something that has limited support in FFmpeg, but is used by GStreamer effectively. A pipeline of v4l2src to v4l2h264enc with zero copy should achieve 1080p30 easily, and potentially 1080p50.
The OMX framework will always end up copying all buffers to and from GPU memory, therefore the performance is lower.

I'm getting random failures starting up the h264 encoder sometimes:
[12205.121366] cma: cma_alloc: alloc failed, req-size: 192 pages, ret: -12
[12205.121375] bcm2835-codec bcm2835-codec: dma_alloc_coherent of size 786432 failed

So V4L2 is using the kernel CMA heap for all allocations instead of gpu_mem (although there is still some requirement for gpu_mem).
The default size of the cma heap on Pi0-3 is generally fairly small, unless you are using vc4-fkms-v3d or vc4-kms-v3d for 3D and rendering, in which case the heap is increased to 256MB.
vc4-fkms-v3d is the default on Pi4 as there is no firmware GL driver, therefore it is generally 256MB.
GL and DRM/KMS are the main reason the V4L2 drivers were written as it moves towards the Linux standard APIs for things like Kodi, and they are looking at removing as much platform specific stuff as possible.
Add cma=128M to /boot/cmdline.txt to define the heap to be 128MB

Padding is controlled via a combination of bytesperline, and the selection API. No padding is allowed between the planes for the single planar format. I'll check what you've done in your patch.

6by9 on 23 Aug 2019

I've made a couple of comments on your commit for FFmpeg.

Sorry, I don't know FFmpeg well enough to know for myself, but does it abide by the bytesperline value that the driver returns after having set the format? If not then your memcpy ought to check the strides match before blindly copying a plane at a time.

6by9 on 23 Aug 2019

/dev/video12 will happily rescale images for you using the ISP hardware block.

How does the v4l2 api expose the fact that ISP has two output ports?

OMX.broadcom.resize is done on the VPU (generic SIMD processor within the GPU). More power required, and slower. I see little point in exposing that via V4L2.

Makes sense, thanks for the context.

Also note that the V4L2 M2M devices are set up to support dmabuf import and export, therefore can do zero copy paths around the place.
The OMX framework will always end up copying all buffers to and from GPU memory, therefore the performance is lower.

Very interesting!

Add cma=128M to /boot/cmdline.txt to define the heap to be 128MB

Thank you.

I've made a couple of comments on your commit for FFmpeg.

Appreciate the review.

tmm1 on 23 Aug 2019

Noticed this randomly and I'm not sure if its intended or a bug:

https://github.com/raspberrypi/linux/blob/64f2b1b0a728a13373f9c74c6247ecf17af2caef/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L1344-L1345

https://github.com/raspberrypi/linux/blob/64f2b1b0a728a13373f9c74c6247ecf17af2caef/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L1360-L1361

The former is using height instead of crop_height.

tmm1 on 23 Aug 2019

How does the v4l2 api expose the fact that ISP has two output ports?

It currently doesn't. The low res output is ignored.

There is work ongoing to expose both outputs and add in a (Bayer) statistics output port as well to work with Libcamera (libcamera.org).
Doing so within V4L2 requires a /dev/videoN node for each input and output port, ie 4 total (1 in, 3 out) as otherwise you can't tell which queue has buffers available when select returns. The downside of that is that you can only have one client at a time as you can't tell the context for the buffers.
I suspect we'll keep /dev/video12 available as the simple multiple client 1-in/1-out resize, and add /dev/video13-16 as the more capable, single client, resize.

6by9 on 23 Aug 2019

It currently doesn't. The low res output is ignored.

Hm, so does the first output also scale down? I thought you had to use the second output for downscale and the first output only did color conversion?

I'm interested in the simplest way to do a hardware downscale.

tmm1 on 23 Aug 2019

Hm, so does the first output also scale down? I thought you had to use the second output for downscale and the first output only did color conversion?

I'm interested in the simplest way to do a hardware downscale.

It'll rescale almost any way you like. Up or downscaling. Format conversion. I can't recall if I have arbitrary crop in there or not.
The restriction is actually on the low res output that it must be the same or lower resolution than the high res one - it can't upscale.

6by9 on 23 Aug 2019

Gotcha. So for my purposes resize and isp are basically equivalent and the latter is faster.

tmm1 on 23 Aug 2019

Gotcha. So for my purposes resize and isp are basically equivalent and the latter is faster.

Other than the potential omission of arbitrary cropping in /dev/video12, yes.

6by9 on 23 Aug 2019

Thanks again for your help. I was able to write an ffmpeg video filter that uses /dev/video12 for scaling: https://github.com/tmm1/FFmpeg/commit/ea38d156b163c03949fbc6ead8f05999d1439ba2

Interestingly, if I take the frame buffer returned by v4l2 and try to encode it with h264_omx, it locks up right away in OMX_EmptyThisBuffer(). So if you're looking for a consistent repro for https://github.com/raspberrypi/firmware/issues/851 then I have one.

tmm1 on 24 Aug 2019

Also note that the V4L2 M2M devices are set up to support dmabuf import and export, therefore can do zero copy paths around the place.

I'm still struggling to do 30fps encode, likely because I'm doing a copy between the resizer and the encoder. How does zero-copy buffering work between devices? Doesn't each device preallocate it's own buffers?

tmm1 on 24 Aug 2019

As noted in https://github.com/raspberrypi/firmware/issues/851#issuecomment-326533267, #851 is down to the mapping flags that the Linux kernel uses around allocations, and the fact that some of them prevent the VCHIQ calls from mapping back to a list of pages to be copied. There must be a way of doing it, but it's finding the right magic runes within the kernel to achieve it. There aren't many use cases that hit the issue, therefore it's been a fairly low priority.

Dmabufs allow buffers to be shared between kernel subsystems.

For V4L2, once you have allocated buffers you can call VIDIOC_EXPBUF to get an associated dmabuf handle (it's a file descriptor that you can mmap and do a few other manipulations on). https://www.linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/vidioc-expbuf.hg.tml#vidioc-expbuf

That handle can then be passed into DRM, EGL, or another V4L2 device, so the sink does not allocate buffers for itself. Under V4L2 you follow the mode of operation described in https://www.linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/dmabuf.html for importing dmabufs using V4L2_MEMORY_DMABUF instead of V4L2_MEMORY_MMAP.

GStreamer has a load of framework stuff to be able to support passing dmabufs instead of genuine mappable pixel buffers around. The V4L2 M2M components have properties called capture-io-mode and output-io-mode to configure the relevant modes.
I don't know how easy it would to be to modify ffmpeg to do the same sort of thing. There is some handling for DRM_PRIME buffers (DRMs name for dmabufs), but I've only seen that used as the final output stage of ffmpeg (mainly vaapi, with changes being discussed for V4L2 decode), but not for passing between processing stages.

I'd be suspicious of the memcpys in ff_v4l2_buffer_avframe_to_buf as those will take a modest amount of time.

6by9 on 24 Aug 2019

👍1

Thanks for the links to VIDIOC_EXPBUF and V4L2_MEMORY_DMABUF. I should be able to make zero-copy paths work in ffmpeg with this technique, either by exporting to AV_PIX_FMT_DRM_PRIME or adding a new AV_PIX_FMT_V4L2_DMABUF

Though I'm sure the memcpy isn't helping in the encode case, I just discovered that part of the throughput issue in the scaler itself was due to me enqueuing/dequeuing buffers one at a time. If I enqueue input without blocking on the output dequeue (i.e. actually use the buffers), I'm easily seeing 30% increase in performance.

tmm1 on 24 Aug 2019

What is the correct way to drain frames out of the ISP? Should I be sending V4L2_DEC_CMD_STOP/V4L2_ENC_CMD_STOP?

tmm1 on 26 Aug 2019

What is the correct way to drain frames out of the ISP? Should I be sending V4L2_DEC_CMD_STOP/V4L2_ENC_CMD_STOP?

Neither of these commands is implemented for ISP.

The issue I was having with draining was specific to ffmpeg's v4l2 wrappers. From the ISP driver's perspective there is nothing special required to drain- you wait on the buffers you're expecting and then call STREAMOFF afterwards.

tmm1 on 27 Aug 2019

Under V4L2 you follow the mode of operation described in https://www.linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/dmabuf.html for importing dmabufs using V4L2_MEMORY_DMABUF instead of V4L2_MEMORY_MMAP.

I'm confused because the examples on that page use CAPTURE instead of OUTPUT. Wouldn't dmabuf only be able to be imported into OUTPUT devices?

tmm1 on 27 Aug 2019

Welcome to the wonderful world of dmabuf!

dmabufs can be imported on either queue. They are only setting up who is allocating the buffer, not who is writing and who is reading.
You could even have a totally separate allocator, and then two subsystems importing the buffer to actually make use of it.
There are also cases where some drivers only support importing, whilst others only support exporting, therefore that would dictate who had to allocate and export, vs import.

There are further considerations too as to how the backing memory is allocated.
As an example, the Pi display hardware requires contiguous buffers as it doesn't go through an IOMMU. The DRM/KMS therefore will only allow import of buffers that fulfil that criteria.
If you take a UVC USB webcam, that uses V4L2 vmalloc buffer allocator. If you were to try and export a dmabuf from that webcam and import it into the DRM/KMS, then that will fail as it isn't contiguous. If you allocate from DRM/KMS (which will give you a contiguous buffer), then the V4L2 side of uvcvideo will happily import that buffer as it fulfils the constraints that it requires (almost none).

There are a number of presentations over dmabuf and zero copy video pipelines:

https://elinux.org/images/a/a8/DMA_Buffer_Sharing-_An_Introduction.pdf
https://elinux.org/images/5/53/Zero-copy_video_streaming.pdf
There's one by Laurent Pinchart of Idea On Board that I can't find at present which covers much of this too.
The concept will be used extensively by the libcamera project (libcamera.org) for adding support for the more complex camera pipelines that you are now getting on devices. https://events.linuxfoundation.org/wp-content/uploads/2017/12/20181024-ELCE-Why_embedded_cameras_are_difficult_and_how_to_make_them_easy_Laurent.pdf

6by9 on 28 Aug 2019

Codec stop commands are there more due to the fact that the codec may hang on to buffers for use as reference frames. The _STOP commands tell the codec to finish off processing the last frames and return them all.
With the ISP there is no similar concept - it's always one in, one out. I don't recall there being any discussion about it, but that may be more because V4L2 resizers aren't that common.

6by9 on 28 Aug 2019

So in the case of ISP, can I configure the capture side using V4L2_MEMORY_DMABUF buffers? And would that mean the buffers I receive with VIDIOC_DQBUF would have m.fd set already without having to call VIDIOC_EXPBUF?

tmm1 on 28 Aug 2019

If you want V4L2 to allocate the buffer and give you a dmabuf handle, then you have to use V4L2_MEMORY_MMAP (or similar), and then call VIDIOC_DQBUF.
V4L2_MEMORY_DMABUF is only for importing.

6by9 on 28 Aug 2019

V4L2_MEMORY_DMABUF is only for importing.

Okay got it, thanks. That was my understanding as well but I wasn't sure if I was missing something.

tmm1 on 28 Aug 2019

The OMX framework will always end up copying all buffers to and from GPU memory, therefore the performance is lower.

Testing just raw h264 encode performance on 1920x1080 streaming images, V4L2 is 11% faster than OMX on RPI3, and 35%(!!) faster on RPI4:

| | omx | v4l2 |
|-----|-------|-------|
| pi3 | 35fps | 39fps |
| pi4 | 40fps | 54fps |

tmm1 on 29 Aug 2019

Is there a fast-path in the ISP when the input/output format and dimensions are the same? If not, would it be reasonable to try it add that at the vchiq/mmal/bcm-codec layers, or more practical for the ISP to be skipped at the application layer?

If I simply wanted to upload a picture and get dmabuf fd's for it, which video device would be best to use for that?

tmm1 on 29 Aug 2019

Is there a fast-path in the ISP when the input/output format and dimensions are the same? If not, would it be reasonable to try it add that at the vchiq/mmal/bcm-codec layers, or more practical for the ISP to be skipped at the application layer?

Sorry, I don't see what you're trying to achieve. If your application doesn't want to do a resize or conversion, then don't send the buffer to the ISP.
There are no fast paths within the ISP hardware - the whole pipeline is runs at 1 pixel per clock, although in the case of resizing that may be one pixel in or one pixel out (not necessarily both) per clock. eg for downsizing the resize block will consume one pixel per clock, but won't produce one pixel per clock.

If I simply wanted to upload a picture and get dmabuf fd's for it, which video device would be best to use for that?

Neither.
What are you wanting to do with this dmabuf? Something has to be doing something with it, and one of the users should be the allocator.
For DRM/KMS you can allocate dumb buffers.
Pi specific, there is vcsm-cma which can allocate CMA memory, (maps it into the VPU heap), and gives you a dmabuf.
Under Android, the ION allocator does a similar thing, and is nearly merged into the mainline kernel.

6by9 on 29 Aug 2019

In ffmpeg there's the concept of a video filter chain. For most other hw backends, you use the hwupload filter to convert the software frame into a hardware frame, then you can add scaling filters and/or encoders to the end which accept the hardware frames.

So I guess I'm wondering if there's an equivalent way to hwupload some yuv420p frame data into a dmabuf, which could then be fed either into ISP or ENCODE. It sounds like this is not practical because each v4l2 implementation will require a specific allocator to be used for dmabufs, and there's no generic api to invoke the underlying allocator?

tmm1 on 30 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings