Dali: Getting filename and frame number of clips returned by ops.VideoReader

Created on 9 Apr 2019  路  11Comments  路  Source: NVIDIA/DALI

Hello,

I need to read both video frames and audio, but I can't figure out how to do this in DALI. It would be sufficient to know what the filename and frame number of the sequence returned by ops.VideoReader is.

Is there a way to do this?

Thanks.

Video question

Most helpful comment

Returning frame number, or labels as a function of frame number, would be very useful for my action detection workflow.

All 11 comments

Hi,
Currently it is not possible. It should be easy to return frame number but we it require more work to return strings from DALI as this data type is not currently supported. Tracked as DALI-697.

Hi @keunhong ,

This is not possible without modifying the operator. Once we have the label handling, you should be map the filename to some label. So I would see this as part of the label handling discussed here #666 .

Adding the frame number is another story and we can implement it if we see that is a wanted feature.

Tracked as DALI-698

Returning frame number, or labels as a function of frame number, would be very useful for my action detection workflow.

(I would also like the frame number as mentioned in #666 )

Hi everyone,

this would be indeed a very useful functionality and would allow to build arbitrary label reading functions per frame or sequence within a given Pipeline if I understand it well.

Has there been any progress or design on it? I could have a look into extending the current functionality with this option (I had a look at the C++ operator code and I think it is doable, apart from the returning a string part, which maybe could be encoded or given as an index matching the input filenames provided).

@a-sansanwal ?

@JanuszL @pablodecm
I have thought of a few ways of providing more label information.
I'm also concerned about cluttering output from VideoReader. For example we can have the following types of labels :

  • folders as labels
  • each file as a label
  • frame number information as a label. WIP change
  • labelling frame number x to y in a video as label z, and then a to b as c and so on..
  • labels based on scene change

It's easy to produce this information in videoreader but I really wish there was a nicer way we could return all of them for example as a python dictionary maybe. Because no one particular way of labelling serves all use cases, but i also want to avoid cluttering output.
Do you have any suggestions @JanuszL ?

I think we can:

  • folders as labels as a default
  • each file as a label when the file list is provided and so the user can assign a unique label to each file, or the same label to all files in given folder
  • frame number information as additional info, next to label not replacing it
  • labels based on scene change - I think the user can obtain it in his code using just a frame number returned by the reader

How about that @a-sansanwal?

I am quite new to this library and its design, but what @JanuszL suggested seems quite reasonable, possibly with argument flags to select between turning on and off optional output such as frame number. Probably this is the simplest, fastest way to go about it.

Alternatively, other data structure such as dictionary with all the relevant frame and file metadata is also a strong option as @a-sansanwal pointed out. I would say this requires a bit more of work but could be more general and support other use cases.

Whatever you decide its best, please feel to ping me and I would be eager to help out in what I can both in implementation and review.

I'll send some PR's soon.

Can you check how https://github.com/NVIDIA/DALI/pull/1500 works for you with the latest nightly build?

Was this page helpful?
0 / 5 - 0 ratings