Detectron2: [baseline] [reproducibility] Problem to reproduce the baseline in Model Zoo

Created on 7 Jan 2020 · 5Comments · Source: facebookresearch/detectron2

Hello,
I have tried to reproduce the Faster-RCNN baseline using R50-FPN_1x. However, there's a drop of around 4-5 points for the box AP, compared to the score 37.9 in Model Zoo. I would really appreciate it if anyone could give me some insights about what might have gone wrong ^ ^
My result:

COCO Evaluation results for bbox:
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 33.551 | 53.341 | 35.969 | 18.661 | 36.469 | 43.063 |

Instructions To Reproduce the Issue:

what changes I made (git diff)
The code version I used is e74a00c of Dec 26, 2019.
No change has been made except for minor changements to run the code on AzureML.
what exact command I run:

python tools/train_net.py --num-gpus 4 \
    --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml

what I observed (including the full logs):
Final result after 90,000 iterations:

[32m[01/05 18:28:18 d2.evaluation.coco_evaluation]: [0mEvaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 33.551 | 53.341 | 35.969 | 18.661 | 36.469 | 43.063 |

I also compared the training loss of my experiment azureml (using 4 K80) with the official metrics
total_loss
loss_cls
loss_reg

The full log is here: loss-4gpu.log

What I tried to understand why

Investigation on influence of GPU numbers
At first I thought that it was because of the batch size had changed when changing num-gpus from 8 to 4. However the full config in the log indicates the same batch size (IMS_PER_BATCH: 16).
Secondly, since the only difference is the number of GPUs (maybe I am wrong), I re-ran the same experiment with different number of GPUs for 10k iterations and compared with the official metrics.

The loss curve shows that my problem is independent of GPU numbers.
Investigation on other baselines
Last but not least, I tried another baseline: mask_rcnn_R_50_FPN_1x of COCO Instance Segmentation Baselines with Mask R-CNN. And the similar performance drop happened again, around 4 AP point drop compared to the reference (38.6 box AP and 35.2 mask AP)

My result:

COCO Evaluation results for bbox:
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 34.483 | 54.024 | 37.521 | 19.586 | 36.835 | 44.503 |

COCO Evaluation results for segm:
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 31.698 | 51.398 | 33.640 | 14.372 | 33.611 | 45.871 |

My log: loss-4gpu-mrcnn.txt

Environment:

(py36) root@e07abda472cc:/ai-detectron2# python -m detectron2.utils.collect_env
------------------------  -------------------------------------------------------------------
sys.platform              linux
Python                    3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Numpy                     1.15.0
Detectron2 Compiler       GCC 5.4
Detectron2 CUDA Compiler  10.1
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.1
PyTorch Debug Build       False
torchvision               0.4.2
CUDA available            True
GPU 0,1                   Tesla K80
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.1, V10.1.243
Pillow                    5.2.0
cv2                       4.1.0
------------------------  -------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

bug

Source

ZekunZh

👍4

All 5 comments

There is a bug introduced in Dec 19 that affects accuracy, fixed in fd14855ad6c36b2881d6199cad59831473cb1a33 at the same day but after your commit

I rerun all the R50-FPN-1x baselines on Dec 31 and they are reproduced.

ppwwyyxx on 7 Jan 2020

👍1

We retrain our models regularly, but certainly not as frequent as every commit, so bugs can sometimes happen but will eventually be found. Let us know if you still have trouble reproducing the results using latest code.

ppwwyyxx on 7 Jan 2020

Sorry for the confusion and I'll add a section in docs to keep track of historical bugs.

ppwwyyxx on 7 Jan 2020

Great ! Thanks for your quick response ! @ppwwyyxx

ZekunZh on 7 Jan 2020

Yes, using the lastest code (commit 5e2a6f) works for me ! Thanks again for your help @ppwwyyxx

ZekunZh on 8 Jan 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings