Dali: how to deal with more than one tfrecord file?

Created on 17 Sep 2019  Â·  8Comments  Â·  Source: NVIDIA/DALI

When I look up the Tutorial about "Using PyTorch DALI plugin: using various readers", I find there is just one tfrecord and one tfrecord_idx in example, but in my case, there are more than one tfrecord and tfrecord_idx to deal with, so how can I change the input of "ops.TFRecordReader" to make it works or is it the other way around?

question

Most helpful comment

Hi, thanks for the question.
For TFRecordReader path and index_path can be lists of strings. If your dataset is stored in multiple TFRecord files you just pass lists with paths to all of them.

All 8 comments

Hi, thanks for the question.
For TFRecordReader path and index_path can be lists of strings. If your dataset is stored in multiple TFRecord files you just pass lists with paths to all of them.

Thanks for your answer! I tried it as your suggest, but there's some problems came up that I can't solve, so it would be very nice of you if you can check my code and help me find the key to the question. Here is my code:

1.txt

@hyqyoung Now you have an error form nvJPEG. Since you are using mixed decoder it uses nvJPEG to decode the images. It looks like something similar to this.
From your code I see that you don't use a lot of threads per GPU, but still it might be some allocation issue. What platform are you using?

@awolant Thanks for your review! I've found that the reason for the last error is that my GPU machine was busy running experiments and there was less memory available. Today I run the dali code on 8x Titan Xps which has 12G memory per GPU, and it was successful for the tfrecord0, but not successfully completely for the tfrecord1(since I got the same error when I passed lists with paths to them). And I change the num_threads but it didn't work. Here are the results:
2.txt

This happens when nvJPEG is unable to decode the image (e.x. nvJPEG does not support progressive AFAIK). Maybe you can try the HostDecoder to make sure that everything else is ok?

I think it may be a bug anyway in our code. I see that error comes from https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h#L320 where we call nvjpegJpegStreamParse. If it fails we should fallback to the host decoder, not kill the pipeline.
@jantonguirao ?

https://github.com/NVIDIA/DALI/pull/1335 may provide some fix to your problem. Please check the nightly build when it is merged. In the meantime please check HostDecoder as @awolant suggested.

If it still doesn't work please reopen.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ben0it8 picture ben0it8  Â·  3Comments

Usernamezhx picture Usernamezhx  Â·  4Comments

Doom9234 picture Doom9234  Â·  3Comments

samra-irshad picture samra-irshad  Â·  3Comments

cai-linjin picture cai-linjin  Â·  4Comments