Dali: Can DALI read videos with different frames radio?

Created on 22 Oct 2020  ·  14Comments  ·  Source: NVIDIA/DALI

question

Most helpful comment

i use DALI successfully on my slowfast demo.but the speed isn't faster than before.I use DALI with two pythonfuction operator

Python functions are not utilizing multithreading and asynchronous, pipelined execution. It is mostly meant for prototyping and debugging, so moderate performance is expected. Please check the relevant notice in the documentation.

i found a interesting thing.i used DALI's VideoReader to load videos . When i use DALi , load and augment videos cost 0.8s one batch, forward and backward cost 0.8s.when i use pytorch to load videos in cpu, load and augment videos cost 1.4s one batch ,forward and backward cost 0.2s. so ,DALI videoreader operator exactly speed up loading data, but it cost GPU-Util,so that slow down training speed.in total,DALI do not speed up the network when training.

i found the reason above . i used pythonfunction . This lead to high GPU-Util.i remove my pythonfuction. After using DALI,forward and backward cost 0.2s.
at last,thanks for your patient answer. my network speed up x1.6 .

All 14 comments

Hi,
It should work. We have a test for that running over this test data.

thanks for your reply,and there is another question。I want use pyav to load video dataset(.mp4) ,can i turn this tensor data to DALI data? so that i can use DALI operator such as resize、cope。which operator should i use?

Hi,
It should work. We have a test for that running over this test data.

thanks for your reply,and there is another question。I want use pyav to load video dataset(.mp4) ,can i turn this tensor data to DALI data? so that i can use DALI operator such as resize、cope。which operator should i use?

Hi,
I haven't used pyav, but as long as it can return numpy or cupy arrays you can use the ExternalSource operator to load the data from it, like in this example.

Hi,
I haven't used pyav, but as long as it can return numpy or cupy arrays you can use the ExternalSource operator to load the data from it, like in this example.

hello,i use ExternalSource to load my video ,i want use ExternalSource to output three nums_outs, fisrt is 32fps video,second is 8fps video,third is label。but i got the error:

RuntimeError: [/opt/dali/dali/pipeline/pipeline.h:175] Cannot find __ExternalSource_0 tensor, it doesn't exists or was pruned as unused one.

how can i fix it?

Hi,
It seems that one of the ExternalSource outputs is neither used by any other operator nor is the pipeline output. In such case, DALI removes such graph edge but the ExternalSource still tries to feed it with the data. Please use all ExternalSource outputs.

Hi,
It seems that one of the ExternalSource outputs is neither used by any other operator nor is the pipeline output. In such case, DALI removes such graph edge but the ExternalSource still tries to feed it with the data. Please use all ExternalSource outputs.

this is useful ,thanks!

Hi,
It seems that one of the ExternalSource outputs is neither used by any other operator nor is the pipeline output. In such case, DALI removes such graph edge but the ExternalSource still tries to feed it with the data. Please use all ExternalSource outputs.

this is useful ,thanks!

i want use multigpus when use externalSource loading data, can it achieve?

i want use multigpus when use externalSource loading data, can it achieve?

When you are using the ExternalSource it is up to you to do the sharding and make sure that each DALI instance reads from the non-overlapping set of data. What you can do is:

  • preshuffle the whole data set with the fixed seed in each DALI process instance
  • spit it into a number of DALI pipeline instances equal parts/shards
  • make each ExternalSource read from the separate shard

i want use multigpus when use externalSource loading data, can it achieve?

When you are using the ExternalSource it is up to you to do the sharding and make sure that each DALI instance reads from the non-overlapping set of data. What you can do is:

  • preshuffle the whole data set with the fixed seed in each DALI process instance
  • spit it into a number of DALI pipeline instances equal parts/shards
  • make each ExternalSource read from the separate shard

thanks for your advice,i use DALI successfully on my slowfast demo.but the speed isn't faster than before.I use DALI with two pythonfuction operator. If these two operetor slow down speed?

i use DALI successfully on my slowfast demo.but the speed isn't faster than before.I use DALI with two pythonfuction operator

Python functions are not utilizing multithreading and asynchronous, pipelined execution. It is mostly meant for prototyping and debugging, so moderate performance is expected. Please check the relevant notice in the documentation.

i use DALI successfully on my slowfast demo.but the speed isn't faster than before.I use DALI with two pythonfuction operator

Python functions are not utilizing multithreading and asynchronous, pipelined execution. It is mostly meant for prototyping and debugging, so moderate performance is expected. Please check the relevant notice in the documentation.

i found a interesting thing.i used DALI's VideoReader to load videos . When i use DALi , load and augment videos cost 0.8s one batch, forward and backward cost 0.8s.when i use pytorch to load videos in cpu, load and augment videos cost 1.4s one batch ,forward and backward cost 0.2s. so ,DALI videoreader operator exactly speed up loading data, but it cost GPU-Util,so that slow down training speed.in total,DALI do not speed up the network when training.

i use DALI successfully on my slowfast demo.but the speed isn't faster than before.I use DALI with two pythonfuction operator

Python functions are not utilizing multithreading and asynchronous, pipelined execution. It is mostly meant for prototyping and debugging, so moderate performance is expected. Please check the relevant notice in the documentation.

i found a interesting thing.i used DALI's VideoReader to load videos . When i use DALi , load and augment videos cost 0.8s one batch, forward and backward cost 0.8s.when i use pytorch to load videos in cpu, load and augment videos cost 1.4s one batch ,forward and backward cost 0.2s. so ,DALI videoreader operator exactly speed up loading data, but it cost GPU-Util,so that slow down training speed.in total,DALI do not speed up the network when training.

i found the reason above . i used pythonfunction . This lead to high GPU-Util.i remove my pythonfuction. After using DALI,forward and backward cost 0.2s.
at last,thanks for your patient answer. my network speed up x1.6 .

I'm happy that it works for you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samra-irshad picture samra-irshad  ·  3Comments

ZHUANGHP picture ZHUANGHP  ·  5Comments

cai-linjin picture cai-linjin  ·  4Comments

ben0it8 picture ben0it8  ·  3Comments

tianyang-li picture tianyang-li  ·  4Comments