Pytorch-cyclegan-and-pix2pix: something about Multi-gpu training in the source code

Created on 17 Sep 2018  ·  15Comments  ·  Source: junyanz/pytorch-CycleGAN-and-pix2pix

By reading the source code, I found this code to set up gpu:

str_ids = opt.gpu_ids.split(',')
opt.gpu_ids = []
for str_id in str_ids:
    id = int(str_id)
    if id >= 0:
        opt.gpu_ids.append(id)
if len(opt.gpu_ids) > 0:
    torch.cuda.set_device(opt.gpu_ids[0])

If I set the parameters --gpu_ids 0,1,2,then using this code,it might be running:
torch.cuda.set_device(0),So, is the code here wrong?
I hope that a friend can answer my doubts, I will be grateful.

Most helpful comment

What's the batch_size? If the batch_size is 1, then it cannot be deployed on multi-gpus.
It's better to set batch_size to n*num_gpus, e.g., 2 as you want to use 2 gpus.

All 15 comments

@junyanz @SsnL @taesung89

No it's not wrong. What's the problem?

No it's not wrong. What's the problem?

But in this code,it seems that I just used the 0th gpu?

It just setting the default cuda device to gpu_ids[0], e.g., if you crest a
cuda tensor without specifying device it will be on that device. Nothing
prevents using other devices later. We use DataParallel with all
devices.

On Mon, Sep 17, 2018 at 03:48 sunshine notifications@github.com wrote:

No it's not wrong. What's the problem?

But in this code,it seems that I just used the 0th gpu?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/387#issuecomment-421917812,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFaWZfZbcDaRnF4dgRPBaXQjrFA1qTuLks5ub1O7gaJpZM4WrNwi
.

Maybe you can run the code with multiple GPUs and see how it works.

It just setting the default cuda device to gpu_ids[0], e.g., if you crest a cuda tensor without specifying device it will be on that device. Nothing prevents using other devices later. We use DataParallel with all devices.

On Mon, Sep 17, 2018 at 03:48 sunshine @.*> wrote: No it's not wrong. What's the problem? But in this code,it seems that I just used the 0th gpu? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#387 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaWZfZbcDaRnF4dgRPBaXQjrFA1qTuLks5ub1O7gaJpZM4WrNwi .

ok, I think I should get it.Thank you very much about this wonderful code!

Maybe you can run the code with multiple GPUs and see how it works.

I have done map and horse experiments on multiple gpus, because during the experiment, I feel that your code is very elegant, so I decided to intensively read your code line by line. Thank you for writing such a good code.

Glad that you like the repo. :) Have fun with it. I'm going to close this one for now. Feel free to reopen if you have further questions.

By reading the source code, I found this code to set up gpu:

str_ids = opt.gpu_ids.split(',')
opt.gpu_ids = []
for str_id in str_ids:
    id = int(str_id)
    if id >= 0:
        opt.gpu_ids.append(id)
if len(opt.gpu_ids) > 0:
    torch.cuda.set_device(opt.gpu_ids[0])

If I set the parameters --gpu_ids 0,1,2,then using this code,it might be running:
torch.cuda.set_device(0),So, is the code here wrong?
I hope that a friend can answer my doubts, I will be grateful.

哥们,我也发现这个问题,我设置gpu ids 0,1,但是通过nvidia-smi查看,还是只用了0号GPU,怎么才能用多GPU训练呢?

Have you processed your model with networks.init_net()? Or you can directly use net = torch.nn.DataParallel(net, gpu_ids) to make your model ready for multi-gpus.

Have you processed your model with networks.init_net()? Or you can directly use net = torch.nn.DataParallel(net, gpu_ids) to make your model ready for multi-gpus.

this is source code
def init_net(net, init_type='normal', init_gain=0.02, gpu_ids=[]):
if len(gpu_ids) > 0:
assert(torch.cuda.is_available())
net.to(gpu_ids[0])
net = torch.nn.DataParallel(net, gpu_ids) # multi-GPUs
init_weights(net, init_type, init_gain=init_gain)
return net

but it dose not work

Do you use your custom model or the authors'?
If you're using your own model, do as following:

model = YourModel(params)
# use one of the two lines below
model = torch.nn.DataParallel(model, gpu_ids) # make sure gpu_ids is set properly
model = init_net(model, init_type, init_gain, gpu_ids)

Do you use your custom model or the authors'?
If you're using your own model, do as following:

model = YourModel(params)
# use one of the two lines below
model = torch.nn.DataParallel(model, gpu_ids) # make sure gpu_ids is set properly
model = init_net(model, init_type, init_gain, gpu_ids)

authors I am Xiao Bai……

What's the batch_size? If the batch_size is 1, then it cannot be deployed on multi-gpus.
It's better to set batch_size to n*num_gpus, e.g., 2 as you want to use 2 gpus.

What's the batch_size? If the batch_size is 1, then it cannot be deployed on multi-gpus.
It's better to set batch_size to n*num_gpus, e.g., 2 as you want to use 2 gpus.

you are my sunshine^^

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shivom9713 picture shivom9713  ·  4Comments

wjx2 picture wjx2  ·  3Comments

zerxon picture zerxon  ·  4Comments

TheIllusion picture TheIllusion  ·  5Comments

nootfly picture nootfly  ·  4Comments