Dali: Operations graph building on Video Pipeline

Created on 12 Jan 2019 · 6Comments · Source: NVIDIA/DALI

I am trying to use DALI for pre-processing a loaded video, for inference.

This is my code:

class VideoPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, data, shuffle):
        super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
        self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=sequence_length,
                shard_id=0, num_shards=1,
                random_shuffle=shuffle, initial_fill=initial_prefetch_size)
        self.caster = ops.Cast(device="gpu", dtype=types.UINT8)
        self.resize = ops.Resize(device="gpu", resize_x=NETW, resize_y=NETH)
        self.permute = ops.NormalizePermute(device="gpu", height=NETH, width=NETW, mean=MEAN, std=STDV, output_dtype=types.FLOAT)

    def define_graph(self):
        img = self.input(name="Reader")
        imgC = self.caster(img)
        #imgResized = self.resize(imgC) # Results in error saying "resize" expects 3-D input
        imgResized = self.resize(imgC.squeeze())
        imgPermuted = self.permute(imgResized)
        return imgPermuted

All sorts of ops are returning an error:
TypeError: Expected inputs of type EdgeReference or list of EdgeReference. Received input type TensorListGPU

Kindly recommend how to build a pipeline with the Video Reader? In case that is not supported, please suggest a way to use DALI's in built ops to do the same operations (cast-->resize-->permute) outside of the pipeline?

Video enhancement external contribution welcome

Source

adroit91

👍1

Most helpful comment

Hello there, we are also investigating the possibility for efficient video data loading and augmentation with DALI. Did you managed to solve this issue? If not, maybe consider implementing "reshape" operation support in DALI?
That is, for image batch loading, we have a tensor with shape NHWC (or HCHW after transpose), for video batch loading, the tensor shape is NTHWC or NTCHW. The current issue is that DALI cannot resize or perform other augmentation operations on 4-D tensor. But in practice, we can just reshape this tensor from NTHWC to (N*T)HWC (just merge the first 2 axes of tensor and reshape back after augmentation).

Another question here is, for video data augmentation, we must keep the same configuration of augmentation for all sampled frames from the same video, i.e. if we apply random crop here, we must ensure the same crop position across all frames. So I would like to have more details about configuration of randomness in DALI to see if this requirement can be satisfied. The same configuration of randomness for a whole batch of videos is ok, as long as the configuration varies between batches.

Thanks!

bearsroom on 26 Mar 2019

👍4

All 6 comments

Hi @adroit91 ,

VideoReader operator produces a batch of GPU Sequences (which are 4-D Tensors), which are not supported by most of operators yet. We are currently working on bringing the support of Sequences tensors to the most commonly used operators.

Unfortunately best you can do for now is directly return the VideoReader decoded sequences and process them further out of DALI.
We will keep you posted here on the Sequence operators are ready.

The error you get come from the fact that you are trying to use a runtime method ,squeeze, in the symbolic graph definition of define_graph.
EdgeReference type the static definition of the operator ouputs (outputs being the edges of DALI's internal graph representation), while squeeze produces a TensorListGPU and should be used on the dynamic output returned by DALI pipeline.

Kh4L on 12 Jan 2019

@Kh4L - maybe we should add some info to the operator doc indicating if it supports video or not, to avoid this kind of questions/confusion in the future.

JanuszL on 13 Jan 2019

Thanks!

bearsroom on 26 Mar 2019

👍4

Hi,
For the video we are flattening the sequence of frames and doing as you suggest, you can see the list of operators supporting the sequences here (it will be available in 0.8).
It is implemented in
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/operators/crop/crop.cu#L83. For the problem of randomness, we need to make the operator is aware that each k-length sequence of images if one sequence. This is not a difficult thing to do and this could be a perfect place where we would like to see the external contribution for that.
For the 'reshape' you can check docs/examples/video/superres_pytorch/dataloading/dataloaders.py. We have Transpose operator which does this NTHWC<->NTCHW. It will be available in 0.8 soon (documentation preview available here)

JanuszL on 26 Mar 2019

+1 for sequences support for Resize (expects NHWC), DumpImage (fails this assert), or "reshape" op to transform NFHWC to NHWC and back as a work around, to support processing VideoReader.

Main issue with doing these transforms outside of DALI and using the framework integrations is finding the framework support for these transforms on the GPU else having to jump between the CPU boundary for it.

Mmdixon on 1 Jun 2019

Hi,
In the most recent DALI version operators' coverage for video sequences has widened (including fused VideoReaderResize and standalone Resize).
I think it should conclude your request.

JanuszL on 2 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings