Hi!
I would like to try out the run_squad.py script (with AWS SageMaker in a PyTorch container).
I will use 8 x 100V 16 GB GPUs for the training.
How should I set the the local_rank parameter in this case?
( I tried to understand it from the code, but I couldn't really.)
Thank you for the help!
The easiest way is to use the torch launch script. It will automatically set the local rank correctly. It would look something like this (can't test, am on phone) :
python -m torch.distributed.launch --nproc_per_node 8 run_squad.py <your arguments>
Hi,
Thanks for the fast answer!
Yes I saw this solution in the examples, but I am interested in the case when I am using PyTorch container and I have to set up an entry point for the training (= run_squad.py) and its parameters . And so in that case how should I set it? Or just let it to be -1?
(Or you recommend in that case to create a bash file as entry where I start this torch lunch.)
Thanks again!
If you want to run it manually, you'll have to run the script once for each GPU, and set the local rank to the GPU ID for each process. It might help to look at the contents of the launch script that I mentioned before. It shows you how to set the local rank automatically for multiple processes, which I think is what you want.
Ok, thanks for the response! I will try that!
If your problem is fixed, please do close this issue.
@tothniki Did you have to modify the script very much to run with SM? Attempting to do so now, as well.
@petulla No, at the end i didn't modify anything regarding to the multiple GPU problem. ( of course I had to modify the read-in and the save to a S3 Bucket).I tried with SageMaker as it was, and it seemed to me that the distribution between GPUs worked.
The easiest way is to use the torch launch script. It will automatically set the local rank correctly. It would look something like this (can't test, am on phone) :
python -m torch.distributed.launch --nproc_per_node 8 run_squad.py <your arguments>
Hi @ugent
what about ( run_language_modeling.py ) ?
Does passing local_rank = 0 to it means it will automatically do the task on 4 GPUs (for ex.) which we have available ? and our speed will be 4 times faster ? (by distributed training)
or we have to run script by ( python -m torch.distributed.launch .....)
@mahdirezaey
Please use the correct tag when tagging...
No, it will not do this automatically, you have to use the launch utility.
Most helpful comment
The easiest way is to use the torch launch script. It will automatically set the local rank correctly. It would look something like this (can't test, am on phone) :