Dali: Decoding videos of varied resolutions/shapes

Created on 1 Apr 2019 · 14Comments · Source: NVIDIA/DALI

Hi,
Thanks for the awesome library. I'd love to use this for training action recognition models, however frequently we have videos of varying aspect ratios which the current VideoReader does not support, it'd be great to add support for this.
Cheers

Video enhancement

Source

willprice

Most helpful comment

Hi @willprice

It's great to see that DALI's Video pipeline is being used.
These features are part of our internal roadmap, especially the fused Crop support. Features that we might re-prioritize thanks to your asking, stay tuned!

In the meantime you can use the Crop operators, which already support sequences.

Kh4L on 1 Apr 2019

👍3

All 14 comments

I should add, we typically deal with this by random or center cropping rather than resizing, it'd be nice to have an analog of nvJPEGDecoderCrop and nvJPEGDecoderRandomCrop if feasible.

willprice on 1 Apr 2019

Hi @willprice

In the meantime you can use the Crop operators, which already support sequences.

Kh4L on 1 Apr 2019

👍3

Tracked internally as DALI-680 and DALI-681

Kh4L on 1 Apr 2019

Just wanted to add a :+1: for this issue. The kinetics dataset, for example, has portrait and landscape videos which is fine with random cropping, but NVVL (now DALI) does not allow loading of variable size videos.

jbohnslav on 30 Apr 2019

@Kh4L, I want to load a few videos with different resolutions, first crop them out at 256x256 and then load the image frames. However, I am not sure how to do this with the Crop operator. Could you provide a minimal example? Thank you for your patience.

TheShadow29 on 4 Jun 2019

@TheShadow29 - have you checked data loading part of SuperResolution example?

JanuszL on 4 Jun 2019

@JanuszL I actually saw the example, but I believe I misunderstood what it did. To the best of my understanding (please correct me if I am wrong), we first need to convert every video to be loaded to the same scale (540p/720p/1080p) and same codec and then use the VideoPipeReader which then works beautifully as intended.

However, I was wondering if we could instead load different videos with different resolutions, but use the crop operation to crop everything to say the lowest resolution, and have this at run-time. I am able to use crop when using same resolution videos, if I use different resolution videos, this part https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/operators/reader/loader/video_loader.cc#L112 throws an error.

Basically, I am wondering if there is any way to have different video resolutions and at load time we rescale the video. If not, is the solution to first rescale the videos to a uniform scale, and then process them?

Thank you for your patience. Also, thank you for the awesome library.

TheShadow29 on 4 Jun 2019

@TheShadow29 - I mislead you. Indeed video decoder can work now only with one video resolution at the time. The thing that prevents us from easy extending it is that video decoder instance has fixed video stream parameters (code responsible for that is here).
To lose that constraint we need to experiment on how to efficiently approach this problem to avoid huge memory consumption and not sacrifice the performance at the same time.

JanuszL on 4 Jun 2019

@JanuszL thank you for clearing that up. I agree that it definitely might become a performance bottleneck.

On a related note, I was wondering if the constraint for uniform resolution and same codec should be mentioned in the tutorial for video loading (https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/video/video_reader_simple_example.html), and referencing this issue if it would be tracked here? It might also be helpful to give a link to transcode_scenes (https://github.com/NVIDIA/DALI/blob/master/docs/examples/video/superres_pytorch/tools/transcode_scenes.py) in superres examples which shows how to convert the videos.

TheShadow29 on 5 Jun 2019

@TheShadow29 transcode_scenes is actually used here https://github.com/NVIDIA/DALI/blob/master/docs/examples/video/superres_pytorch/prepare_data.sh , being itself an example of how to use the script

Kh4L on 5 Jun 2019

@TheShadow29 small doc added in #944

Kh4L on 5 Jun 2019

👍1

@Kh4L yes, that's how I came to know about it. I actually didn't convey it properly, I meant we should have the link to it in the VideoPipe tutorial itself. The current tutorial splits a video into 4 parts so that complication doesn't exist, but would still be nice to give a pointer on what to do in case of multiple videos which have different resolutions/codec etc. The transcode_scenes.py could be easily adapted for that; I created an example gist here : https://gist.github.com/TheShadow29/8809abda7002b57ef2749657716e09fc (slightly modified version of the transcode_scenes.py)

TheShadow29 on 5 Jun 2019

@TheShadow29 - when you have videos with the different resolution, I assume that other parameters are the same (like codec type, chroma subsampling). If it is so, then cuvidReconfigureDecoder could be a solution for such a case.

JanuszL on 10 Jun 2019