Dali: examples about how to use dali multi gpu in pytorch

Created on 23 Jul 2020  路  10Comments  路  Source: NVIDIA/DALI

Hi I have seen the sharding in the tutorial, I know there are different pipelines. But I am wondering how to write a full pytorch script? Is is like we need to use local rank to determine the allocation of pipeline to the GPU

question

All 10 comments

Hi,
Yes, you need a local rand to assign the proper GPU, and global rank to assign the proper shard. Please take a look at this RN50 PyTorch example.

Hi, the example using ops.FileReader, it is very convenient to do the sharding, but I am wondering how to implement my own inputer, but I think the logic is the same, just sharding the data to the ranks. and the gradient are connected by apex.distributed, Do I understand right?

Hi,
I recommend using ExternalSource operator like in this example.

the logic is the same, just sharding the data to the ranks

Yes, that is what should be done.

Hi, It seems that the dataloader is unbalanced. I have nearly same sharding data to each GPU, But the GPU memory usage is not the same.

image
image
image
I think I have use the same sharding, why do the gpu memory is unbalanced

I use pytorch distributed training framework.
image

Are you sure that you run the model in each GPU? DALI should not consume whole GPU memory (16GB in your case) on its own. Also in the screen you provide I see a magnitude of the difference 4700 vs 40000 of file lengths.
Can you run your code on one GPU at the time and go by each GPU one by one (like run everything on the GPU no.2 only)?

@JanuszL really hope to your answer!

Hi,
I would check the size of the samples you have in each pipeline. Maybe for some reason, the first one gets the biggest one (as I see you are sorting them by size, right?)?
You may also want to check this function and see if each operator's output buffer consumes the same amount of memory.
You can also reduce the batch size to 1 and compare this one sample from each pipeline side by side.

Thanks for your reply. I notice that this is due to different shape of each sharding data, they padding to different size. Thanks a lot!

Was this page helpful?
0 / 5 - 0 ratings