In the idist.Parallel method in Ignite is there a way to select which GPUs to use during training?
Related to the question in Issue #1118
@ryanwongsa currently the most reliable way to select GPUs to run on is script-wise with CUDA_VISIBLE_DEVICES="0,1,2,3". Does it work for your use-case ?
Currently I don't have a use-case, it was more of a general question since I noticed other Pytorch higher-level frameworks have an option like selecting GPUs.
Would the above solution work if I want to run multiple scripts on different gpus? Say I have 4 GPUs and want to train 1 model on GPU 0,1 and another model on GPU 2, 3 simultaneously. e.g:
CUDA_VISIBLE_DEVICES="0,1"
python train1.py
CUDA_VISIBLE_DEVICES="1,2"
python train2.py
Maybe this is a Pytorch question instead of an Ignite question though.
Yes, it should work if there is not overlapping between devices: 0,1 for train1 and 2,3 for train2. It would work as if you had 2 GPUs. Another examples,
CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.device_count())"
> 1
CUDA_VISIBLE_DEVICES=0,1 python -c "import torch; print(torch.cuda.device_count())"
> 2
or
# terminal 1
CUDA_VISIBLE_DEVICES=0 python -c "import torch; torch.rand(64, 128, 512, 512, device='cuda'); import time; time.sleep(60);"
# terminal 2
CUDA_VISIBLE_DEVICES=1 python -c "import torch; torch.rand(32, 128, 512, 512, device='cuda'); import time; time.sleep(60);"
>
| 0
| 0% 39C P2 57W / 280W | 8743MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1
| 24% 47C P2 60W / 250W | 4647MiB / 11176MiB | 0% Default |
Another thing to keep in mind is CPU(num_workers) / RAM usage as independent scripts will use the same CPU resources.
Great thanks. That looks good enough for future use cases.