Pytorch-cyclegan-and-pix2pix: something about Multi-gpu training in the source code

Created on 17 Sep 2018 · 15Comments · Source: junyanz/pytorch-CycleGAN-and-pix2pix

By reading the source code, I found this code to set up gpu:

str_ids = opt.gpu_ids.split(',')
opt.gpu_ids = []
for str_id in str_ids:
    id = int(str_id)
    if id >= 0:
        opt.gpu_ids.append(id)
if len(opt.gpu_ids) > 0:
    torch.cuda.set_device(opt.gpu_ids[0])

If I set the parameters --gpu_ids 0,1,2,then using this code,it might be running:
torch.cuda.set_device(0),So, is the code here wrong?
I hope that a friend can answer my doubts, I will be grateful.

Source

Adherer

Most helpful comment

What's the batch_size? If the batch_size is 1, then it cannot be deployed on multi-gpus.
It's better to set batch_size to n*num_gpus, e.g., 2 as you want to use 2 gpus.

CastellanLiu on 26 Jan 2019

👍2

All 15 comments

@junyanz @SsnL @taesung89

Adherer on 17 Sep 2018

No it's not wrong. What's the problem?

SsnL on 17 Sep 2018

No it's not wrong. What's the problem?

But in this code,it seems that I just used the 0th gpu?

Adherer on 17 Sep 2018

It just setting the default cuda device to gpu_ids[0], e.g., if you crest a
cuda tensor without specifying device it will be on that device. Nothing
prevents using other devices later. We use DataParallel with all
devices.

On Mon, Sep 17, 2018 at 03:48 sunshine notifications@github.com wrote:

No it's not wrong. What's the problem?

But in this code,it seems that I just used the 0th gpu?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/387#issuecomment-421917812,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFaWZfZbcDaRnF4dgRPBaXQjrFA1qTuLks5ub1O7gaJpZM4WrNwi
.

SsnL on 17 Sep 2018

Maybe you can run the code with multiple GPUs and see how it works.

junyanz on 17 Sep 2018

It just setting the default cuda device to gpu_ids[0], e.g., if you crest a cuda tensor without specifying device it will be on that device. Nothing prevents using other devices later. We use DataParallel with all devices.
…
On Mon, Sep 17, 2018 at 03:48 sunshine @.*> wrote: No it's not wrong. What's the problem? But in this code,it seems that I just used the 0th gpu? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#387 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaWZfZbcDaRnF4dgRPBaXQjrFA1qTuLks5ub1O7gaJpZM4WrNwi .

ok, I think I should get it.Thank you very much about this wonderful code!

Adherer on 19 Sep 2018

👍1

Maybe you can run the code with multiple GPUs and see how it works.

I have done map and horse experiments on multiple gpus, because during the experiment, I feel that your code is very elegant, so I decided to intensively read your code line by line. Thank you for writing such a good code.

Adherer on 19 Sep 2018

👍1

Glad that you like the repo. :) Have fun with it. I'm going to close this one for now. Feel free to reopen if you have further questions.

SsnL on 19 Sep 2018

By reading the source code, I found this code to set up gpu:
str_ids = opt.gpu_ids.split(',')
opt.gpu_ids = []
for str_id in str_ids:
    id = int(str_id)
    if id >= 0:
        opt.gpu_ids.append(id)
if len(opt.gpu_ids) > 0:
    torch.cuda.set_device(opt.gpu_ids[0])
If I set the parameters --gpu_ids 0,1,2,then using this code,it might be running:
torch.cuda.set_device(0),So, is the code here wrong?
I hope that a friend can answer my doubts, I will be grateful.

哥们，我也发现这个问题，我设置gpu ids 0,1，但是通过nvidia-smi查看，还是只用了0号GPU，怎么才能用多GPU训练呢？

MichaelSunEngineer on 26 Jan 2019

Have you processed your model with networks.init_net()? Or you can directly use net = torch.nn.DataParallel(net, gpu_ids) to make your model ready for multi-gpus.

CastellanLiu on 26 Jan 2019

Have you processed your model with networks.init_net()? Or you can directly use net = torch.nn.DataParallel(net, gpu_ids) to make your model ready for multi-gpus.

this is source code
def init_net(net, init_type='normal', init_gain=0.02, gpu_ids=[]):
if len(gpu_ids) > 0:
assert(torch.cuda.is_available())
net.to(gpu_ids[0])
net = torch.nn.DataParallel(net, gpu_ids) # multi-GPUs
init_weights(net, init_type, init_gain=init_gain)
return net

but it dose not work

MichaelSunEngineer on 26 Jan 2019

Do you use your custom model or the authors'?
If you're using your own model, do as following:

model = YourModel(params)
# use one of the two lines below
model = torch.nn.DataParallel(model, gpu_ids) # make sure gpu_ids is set properly
model = init_net(model, init_type, init_gain, gpu_ids)

CastellanLiu on 26 Jan 2019

Do you use your custom model or the authors'?
If you're using your own model, do as following:
model = YourModel(params)
# use one of the two lines below
model = torch.nn.DataParallel(model, gpu_ids) # make sure gpu_ids is set properly
model = init_net(model, init_type, init_gain, gpu_ids)
authors I am Xiao Bai……

MichaelSunEngineer on 26 Jan 2019

What's the batch_size? If the batch_size is 1, then it cannot be deployed on multi-gpus.
It's better to set batch_size to n*num_gpus, e.g., 2 as you want to use 2 gpus.

CastellanLiu on 26 Jan 2019

👍2

What's the batch_size? If the batch_size is 1, then it cannot be deployed on multi-gpus.
It's better to set batch_size to n*num_gpus, e.g., 2 as you want to use 2 gpus.

you are my sunshine^^

MichaelSunEngineer on 26 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings