Detectron2: Support for mixed precision training

Created on 15 Oct 2019 · 8Comments · Source: facebookresearch/detectron2

❓

Is there any plan to support mixed precision training soon? It was already being supported in the maskrcnn-benchmark code.

enhancement

Source

sadanand-singh

👍11

Most helpful comment

Given that now Pytorch natively supports mixed precision training, is there plan to integrate that?

meetshah1995 on 22 Jun 2020

👍3

All 8 comments

@ppwwyyxx I would like to send a PR for apex training. https://github.com/lbin/detectron2/blob/apex/projects/ApexTrainer/README.md

lbin on 19 Jan 2020

👍3

@lbin Hi, Thanks for the great contribution.

I recently also focused in the FP16 training. After checking your code, it seemed that only the amp.initialize and amp.scale_loss APIs are employed. With my experience in image classification scenario, the DistributedDataParallel should also be replaced by from apex.parallel import DistributedDataParallel as DDP. However, I noticed you comment out line 75 in apex_trainer.py. Is there any reason why not using the amp DDP?

BTW, I'm also confused by the data in page https://github.com/lbin/detectron2/blob/apex/projects/ApexTrainer/README.md
The speed and memory cost are higher for the ApexTrainer case, which I expected to be lower than the FP32 version.

Appreciated if you can give some clarification. Thanks

blueardour on 29 Jan 2020

👍1

@blueardour I used batchsize 32 not 16, so the speed and memory are higher in my case.

for ddp:
https://github.com/NVIDIA/apex/tree/master/examples/imagenet#distributed-training

lbin on 31 Jan 2020

@lbin Thanks for the tips.

blueardour on 2 Feb 2020

Given that now Pytorch natively supports mixed precision training, is there plan to integrate that?

meetshah1995 on 22 Jun 2020

👍3

At least for RetinaNet, it was quite straightforward to use mixed precision training (using Pytorch 1.6's support for it). You can see how I did it: https://github.com/indigoviolet/detectron2/commit/d175f4a823cfdcce417e5ea38acbabc5215b9294

I see some memory savings from this, but I'm not sure how best to evaluate if half-precision operations are happening everywhere possible or if there is more work to do. If someone has pointers regarding that, it would be useful.