Detectron2: Train on multiple-gpus

Created on 22 Nov 2019  路  4Comments  路  Source: facebookresearch/detectron2

How To Reproduce the Issue

I try to run launch.py with custom datasets on multiple gpus. Went I try with only one gpu, I have no
Exception but with 2 or more, I got Exception: process 1 terminated with exit code 1

launch(
        main,
        args.num_gpus, # 4 gpus
        num_machines=args.num_machines, # 1
        machine_rank=args.machine_rank,
        dist_url=args.dist_url,
        args=(args,),
    )

image

Environment

sys.platform linux
Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0]
Numpy 1.17.2
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.0
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
torchvision 0.4.1
CUDA available True
GPU 0,1,2,3 GeForce RTX 2080 Ti
CUDA_HOME /usr
NVCC Cuda compilation tools, release 10.0, V10.0.130
Pillow 6.1.0
cv2 4.1.1

Maybe I missed something. But if you have an idea how to solve this problem it would really be appreciated.

Most helpful comment

@Ormagardskvaedi sorry to bother you. Have you solve the problem?

All 4 comments

Please include details about the problem following the issue template. Please also try the builtin examples and include full logs if it fails.

@Ormagardskvaedi sorry to bother you. Have you solve the problem?

@changgeshi Have you solved this problem?

hello there I had similar issue

if __name__ == '__main__':
    launch(
        main,
        num_gpus_per_machine=4,
        num_machines=1,
        machine_rank=0,
        dist_url="auto",
        args=({},)
    )

this helped

related issue: https://github.com/facebookresearch/detectron2/issues/2209

this is what helped in my case

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ChungNPH picture ChungNPH  路  3Comments

marcoippolito picture marcoippolito  路  4Comments

Cold-Winter picture Cold-Winter  路  3Comments

kl720 picture kl720  路  3Comments

LotharTUM picture LotharTUM  路  3Comments