Dali: Decoding videos of varied resolutions/shapes

Created on 1 Apr 2019  路  14Comments  路  Source: NVIDIA/DALI

Hi,
Thanks for the awesome library. I'd love to use this for training action recognition models, however frequently we have videos of varying aspect ratios which the current VideoReader does not support, it'd be great to add support for this.
Cheers

Video enhancement

Most helpful comment

Hi @willprice

It's great to see that DALI's Video pipeline is being used.
These features are part of our internal roadmap, especially the fused Crop support. Features that we might re-prioritize thanks to your asking, stay tuned!

In the meantime you can use the Crop operators, which already support sequences.

All 14 comments

I should add, we typically deal with this by random or center cropping rather than resizing, it'd be nice to have an analog of nvJPEGDecoderCrop and nvJPEGDecoderRandomCrop if feasible.

Hi @willprice

It's great to see that DALI's Video pipeline is being used.
These features are part of our internal roadmap, especially the fused Crop support. Features that we might re-prioritize thanks to your asking, stay tuned!

In the meantime you can use the Crop operators, which already support sequences.

Tracked internally as DALI-680 and DALI-681

Just wanted to add a :+1: for this issue. The kinetics dataset, for example, has portrait and landscape videos which is fine with random cropping, but NVVL (now DALI) does not allow loading of variable size videos.

@Kh4L, I want to load a few videos with different resolutions, first crop them out at 256x256 and then load the image frames. However, I am not sure how to do this with the Crop operator. Could you provide a minimal example? Thank you for your patience.

@TheShadow29 - have you checked data loading part of SuperResolution example?

@JanuszL I actually saw the example, but I believe I misunderstood what it did. To the best of my understanding (please correct me if I am wrong), we first need to convert every video to be loaded to the same scale (540p/720p/1080p) and same codec and then use the VideoPipeReader which then works beautifully as intended.

However, I was wondering if we could instead load different videos with different resolutions, but use the crop operation to crop everything to say the lowest resolution, and have this at run-time. I am able to use crop when using same resolution videos, if I use different resolution videos, this part https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/operators/reader/loader/video_loader.cc#L112 throws an error.

Basically, I am wondering if there is any way to have different video resolutions and at load time we rescale the video. If not, is the solution to first rescale the videos to a uniform scale, and then process them?

Thank you for your patience. Also, thank you for the awesome library.

@TheShadow29 - I mislead you. Indeed video decoder can work now only with one video resolution at the time. The thing that prevents us from easy extending it is that video decoder instance has fixed video stream parameters (code responsible for that is here).
To lose that constraint we need to experiment on how to efficiently approach this problem to avoid huge memory consumption and not sacrifice the performance at the same time.

@JanuszL thank you for clearing that up. I agree that it definitely might become a performance bottleneck.

On a related note, I was wondering if the constraint for uniform resolution and same codec should be mentioned in the tutorial for video loading (https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/video/video_reader_simple_example.html), and referencing this issue if it would be tracked here? It might also be helpful to give a link to transcode_scenes (https://github.com/NVIDIA/DALI/blob/master/docs/examples/video/superres_pytorch/tools/transcode_scenes.py) in superres examples which shows how to convert the videos.

@TheShadow29 transcode_scenes is actually used here https://github.com/NVIDIA/DALI/blob/master/docs/examples/video/superres_pytorch/prepare_data.sh , being itself an example of how to use the script

@TheShadow29 small doc added in #944

@Kh4L yes, that's how I came to know about it. I actually didn't convey it properly, I meant we should have the link to it in the VideoPipe tutorial itself. The current tutorial splits a video into 4 parts so that complication doesn't exist, but would still be nice to give a pointer on what to do in case of multiple videos which have different resolutions/codec etc. The transcode_scenes.py could be easily adapted for that; I created an example gist here : https://gist.github.com/TheShadow29/8809abda7002b57ef2749657716e09fc (slightly modified version of the transcode_scenes.py)

@TheShadow29 - when you have videos with the different resolution, I assume that other parameters are the same (like codec type, chroma subsampling). If it is so, then cuvidReconfigureDecoder could be a solution for such a case.

1144 should address this. Closing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tianyang-li picture tianyang-li  路  4Comments

samra-irshad picture samra-irshad  路  3Comments

ay27 picture ay27  路  6Comments

ZHUANGHP picture ZHUANGHP  路  5Comments

kindoblue picture kindoblue  路  4Comments