I am trying to use DALI for pre-processing a loaded video, for inference.
This is my code:
class VideoPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data, shuffle):
super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=sequence_length,
shard_id=0, num_shards=1,
random_shuffle=shuffle, initial_fill=initial_prefetch_size)
self.caster = ops.Cast(device="gpu", dtype=types.UINT8)
self.resize = ops.Resize(device="gpu", resize_x=NETW, resize_y=NETH)
self.permute = ops.NormalizePermute(device="gpu", height=NETH, width=NETW, mean=MEAN, std=STDV, output_dtype=types.FLOAT)
def define_graph(self):
img = self.input(name="Reader")
imgC = self.caster(img)
#imgResized = self.resize(imgC) # Results in error saying "resize" expects 3-D input
imgResized = self.resize(imgC.squeeze())
imgPermuted = self.permute(imgResized)
return imgPermuted
All sorts of ops are returning an error:
TypeError: Expected inputs of type EdgeReference or list of EdgeReference. Received input type TensorListGPU
Kindly recommend how to build a pipeline with the Video Reader? In case that is not supported, please suggest a way to use DALI's in built ops to do the same operations (cast-->resize-->permute) outside of the pipeline?
Hi @adroit91 ,
VideoReader operator produces a batch of GPU Sequences (which are 4-D Tensors), which are not supported by most of operators yet. We are currently working on bringing the support of Sequences tensors to the most commonly used operators.
Unfortunately best you can do for now is directly return the VideoReader decoded sequences and process them further out of DALI.
We will keep you posted here on the Sequence operators are ready.
The error you get come from the fact that you are trying to use a runtime method ,squeeze, in the symbolic graph definition of define_graph.
EdgeReference type the static definition of the operator ouputs (outputs being the edges of DALI's internal graph representation), while squeeze produces a TensorListGPU and should be used on the dynamic output returned by DALI pipeline.
@Kh4L - maybe we should add some info to the operator doc indicating if it supports video or not, to avoid this kind of questions/confusion in the future.
Hello there, we are also investigating the possibility for efficient video data loading and augmentation with DALI. Did you managed to solve this issue? If not, maybe consider implementing "reshape" operation support in DALI?
That is, for image batch loading, we have a tensor with shape NHWC (or HCHW after transpose), for video batch loading, the tensor shape is NTHWC or NTCHW. The current issue is that DALI cannot resize or perform other augmentation operations on 4-D tensor. But in practice, we can just reshape this tensor from NTHWC to (N*T)HWC (just merge the first 2 axes of tensor and reshape back after augmentation).
Another question here is, for video data augmentation, we must keep the same configuration of augmentation for all sampled frames from the same video, i.e. if we apply random crop here, we must ensure the same crop position across all frames. So I would like to have more details about configuration of randomness in DALI to see if this requirement can be satisfied. The same configuration of randomness for a whole batch of videos is ok, as long as the configuration varies between batches.
Thanks!
Hi,
For the video we are flattening the sequence of frames and doing as you suggest, you can see the list of operators supporting the sequences here (it will be available in 0.8).
It is implemented in
https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/operators/crop/crop.cu#L83. For the problem of randomness, we need to make the operator is aware that each k-length sequence of images if one sequence. This is not a difficult thing to do and this could be a perfect place where we would like to see the external contribution for that.
For the 'reshape' you can check docs/examples/video/superres_pytorch/dataloading/dataloaders.py. We have Transpose operator which does this NTHWC<->NTCHW. It will be available in 0.8 soon (documentation preview available here)
+1 for sequences support for Resize (expects NHWC), DumpImage (fails this assert), or "reshape" op to transform NFHWC to NHWC and back as a work around, to support processing VideoReader.
Main issue with doing these transforms outside of DALI and using the framework integrations is finding the framework support for these transforms on the GPU else having to jump between the CPU boundary for it.
Hi,
In the most recent DALI version operators' coverage for video sequences has widened (including fused VideoReaderResize and standalone Resize).
I think it should conclude your request.
Most helpful comment
Hello there, we are also investigating the possibility for efficient video data loading and augmentation with DALI. Did you managed to solve this issue? If not, maybe consider implementing "reshape" operation support in DALI?
That is, for image batch loading, we have a tensor with shape NHWC (or HCHW after transpose), for video batch loading, the tensor shape is NTHWC or NTCHW. The current issue is that DALI cannot resize or perform other augmentation operations on 4-D tensor. But in practice, we can just reshape this tensor from NTHWC to (N*T)HWC (just merge the first 2 axes of tensor and reshape back after augmentation).
Another question here is, for video data augmentation, we must keep the same configuration of augmentation for all sampled frames from the same video, i.e. if we apply random crop here, we must ensure the same crop position across all frames. So I would like to have more details about configuration of randomness in DALI to see if this requirement can be satisfied. The same configuration of randomness for a whole batch of videos is ok, as long as the configuration varies between batches.
Thanks!