Hello,
I need to read both video frames and audio, but I can't figure out how to do this in DALI. It would be sufficient to know what the filename and frame number of the sequence returned by ops.VideoReader is.
Is there a way to do this?
Thanks.
Hi,
Currently it is not possible. It should be easy to return frame number but we it require more work to return strings from DALI as this data type is not currently supported. Tracked as DALI-697.
Hi @keunhong ,
This is not possible without modifying the operator. Once we have the label handling, you should be map the filename to some label. So I would see this as part of the label handling discussed here #666 .
Adding the frame number is another story and we can implement it if we see that is a wanted feature.
Tracked as DALI-698
Returning frame number, or labels as a function of frame number, would be very useful for my action detection workflow.
(I would also like the frame number as mentioned in #666 )
Hi everyone,
this would be indeed a very useful functionality and would allow to build arbitrary label reading functions per frame or sequence within a given Pipeline if I understand it well.
Has there been any progress or design on it? I could have a look into extending the current functionality with this option (I had a look at the C++ operator code and I think it is doable, apart from the returning a string part, which maybe could be encoded or given as an index matching the input filenames provided).
@a-sansanwal ?
@JanuszL @pablodecm
I have thought of a few ways of providing more label information.
I'm also concerned about cluttering output from VideoReader. For example we can have the following types of labels :
x to y in a video as label z, and then a to b as c and so on..It's easy to produce this information in videoreader but I really wish there was a nicer way we could return all of them for example as a python dictionary maybe. Because no one particular way of labelling serves all use cases, but i also want to avoid cluttering output.
Do you have any suggestions @JanuszL ?
I think we can:
How about that @a-sansanwal?
I am quite new to this library and its design, but what @JanuszL suggested seems quite reasonable, possibly with argument flags to select between turning on and off optional output such as frame number. Probably this is the simplest, fastest way to go about it.
Alternatively, other data structure such as dictionary with all the relevant frame and file metadata is also a strong option as @a-sansanwal pointed out. I would say this requires a bit more of work but could be more general and support other use cases.
Whatever you decide its best, please feel to ping me and I would be eager to help out in what I can both in implementation and review.
I'll send some PR's soon.
Can you check how https://github.com/NVIDIA/DALI/pull/1500 works for you with the latest nightly build?
Most helpful comment
Returning frame number, or labels as a function of frame number, would be very useful for my action detection workflow.