Ignite: Is there a way to select which GPUs to use during training with ignite.distributed?

Created on 4 Nov 2020 · 4Comments · Source: pytorch/ignite

❓ Questions/Help/Support

In the idist.Parallel method in Ignite is there a way to select which GPUs to use during training?

All 4 comments

@ryanwongsa currently the most reliable way to select GPUs to run on is script-wise with CUDA_VISIBLE_DEVICES="0,1,2,3". Does it work for your use-case ?

vfdev-5 on 4 Nov 2020

Currently I don't have a use-case, it was more of a general question since I noticed other Pytorch higher-level frameworks have an option like selecting GPUs.

Would the above solution work if I want to run multiple scripts on different gpus? Say I have 4 GPUs and want to train 1 model on GPU 0,1 and another model on GPU 2, 3 simultaneously. e.g:

CUDA_VISIBLE_DEVICES="0,1"
python train1.py

CUDA_VISIBLE_DEVICES="1,2"
python train2.py

Maybe this is a Pytorch question instead of an Ignite question though.

ryanwongsa on 4 Nov 2020

Yes, it should work if there is not overlapping between devices: 0,1 for train1 and 2,3 for train2. It would work as if you had 2 GPUs. Another examples,

CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.device_count())"
> 1
CUDA_VISIBLE_DEVICES=0,1 python -c "import torch; print(torch.cuda.device_count())"
> 2

# terminal 1
CUDA_VISIBLE_DEVICES=0 python -c "import torch; torch.rand(64, 128, 512, 512, device='cuda'); import time; time.sleep(60);"
# terminal 2
CUDA_VISIBLE_DEVICES=1 python -c "import torch; torch.rand(32, 128, 512, 512, device='cuda'); import time; time.sleep(60);"

> 
|   0  
|  0%   39C    P2    57W / 280W |   8743MiB / 11178MiB |      0%      Default |   
+-------------------------------+----------------------+----------------------+
|   1  
| 24%   47C    P2    60W / 250W |   4647MiB / 11176MiB |      0%      Default |

Another thing to keep in mind is CPU(num_workers) / RAM usage as independent scripts will use the same CPU resources.

vfdev-5 on 4 Nov 2020

👍1

Great thanks. That looks good enough for future use cases.

ryanwongsa on 4 Nov 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Remove useless prints from tests

vfdev-5 · 3Comments

Saving double execution cost during training

CreateRandom · 3Comments

Metrics for GANs

vfdev-5 · 3Comments

Communication between callbacks?

samarth-robo · 3Comments

Backwards compability for torch.save in ModelCheckpoint

kilsenp · 3Comments