Mmdetection: Can you explain more about how to distributed train on two machines each with 4 cores?

Created on 15 Nov 2018  路  1Comment  路  Source: open-mmlab/mmdetection

Thank you so much! One machine works fine too me. But I am not sure about the setting in the configure and how to train it on two machine.

question

Most helpful comment

We just use torch.distributed.launch to start the training processes, see tools/dist_train.sh for details. If you want to train on two machines, you need to follow the "Multi-Node multi-process distributed training" part in the documentation.

Just a reminder, If multiple nodes are not connected with high-speed hardwares, it will be slow.

>All comments

We just use torch.distributed.launch to start the training processes, see tools/dist_train.sh for details. If you want to train on two machines, you need to follow the "Multi-Node multi-process distributed training" part in the documentation.

Just a reminder, If multiple nodes are not connected with high-speed hardwares, it will be slow.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fengxiuyaun picture fengxiuyaun  路  3Comments

letanloc1998 picture letanloc1998  路  3Comments

FrankXinqi picture FrankXinqi  路  3Comments

songyuc picture songyuc  路  3Comments

happog picture happog  路  3Comments