Mmdetection: Unstoppable GPU memory usage

Created on 30 Sep 2020 · 6Comments · Source: open-mmlab/mmdetection

Describe the bug

While training WIDER_Face( in coco label format) with retinanet_r50_fpn_1x, pisa_r50_fpn_1x, the GPU memory usage increasing in each iteration. And causing GPU memory leaky in the first epoch.

It can train on fsaf_r50_fpn_1x and performance 0.000 scores. But it can normal work with atss_r50_fpn_1x and performance normal, so the dataset is well.

Reproduction

What command or script did you run?

CUDA_VISIBLE_DEVICES=1 /home/wit/anaconda3/envs/mmlab/bin/python \
    ./tools/train.py \
    /home/wit/wjx/mmdetection-20200921/configs/widerface/retina_res_widerface.py \
    --work-dir /home/wit/wjx/WiderFace/training \
    --no-validate ")

Did you make any modifications on the code or config? Did you understand what you have modified?
bbox_head=dict(class=1), data = dict(samples_per_gpu=1,workers_per_gpu=1)
What dataset did you use?
WIDER_Face

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1: GeForce RTX 2080
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.168
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 10.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37

CuDNN 7.6.3

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.1
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMDetection: 2.4.0+73ebc52

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
  conda install pytorch==1.6.0 torchvision -c pytorch
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
  $PATH, bash: /home/wit/anaconda3/bin:/home/wit/anaconda3/condabin:/usr/local/cuda/bin:/home/wit/bin:/home/wit/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin: No such file or directory
  $LD_LIBRARY_PATH, bash: /usr/local/cuda/lib64: Is a directory
  $PYTHONPATH, [EMPTY]

Error traceback
If applicable, paste the error trackback here.

2020-09-30 15:35:48,266 - mmdet - INFO - Epoch [1][11050/12876] lr: 1.000e-02, eta: 4:14:35, time: 0.106, data_time: 0.002, memory: 5602, loss_cls: 0.8433, loss_bbox: 0.8669, loss: 1.7102, grad_norm: 0.7107
Traceback (most recent call last):
  File "./tools/train.py", line 179, in <module>
    main()
  File "./tools/train.py", line 175, in main
    meta=meta)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/apis/train.py", line 143, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in train
    **kwargs)
  File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/detectors/base.py", line 234, in train_step
    losses = self(**data)
  File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/core/fp16/decorators.py", line 51, in new_func
    return old_func(*args, **kwargs)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/detectors/base.py", line 168, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/detectors/single_stage.py", line 117, in forward_train
    gt_labels, gt_bboxes_ignore)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/core/fp16/decorators.py", line 131, in new_func
    return old_func(*args, **kwargs)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/anchor_head.py", line 470, in loss
    label_channels=label_channels)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/anchor_head.py", line 355, in get_targets
    unmap_outputs=unmap_outputs)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/core/utils/misc.py", line 54, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/anchor_head.py", line 229, in _get_targets_single
    None if self.sampling else gt_labels)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/core/bbox/assigners/max_iou_assigner.py", line 105, in assign
    overlaps = self.iou_calculator(gt_bboxes, bboxes)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/core/bbox/iou_calculators/iou2d_calculator.py", line 31, in __call__
    return bbox_overlaps(bboxes1, bboxes2, mode, is_aligned)
  File "/home/wit/wjx/mmdetection-20200921/mmdet/core/bbox/iou_calculators/iou2d_calculator.py", line 114, in bbox_overlaps
    wh = (rb - lt).clamp(min=0)  # [rows, cols, 2]
RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 7.77 GiB total capacity; 6.11 GiB already allocated; 797.50 MiB free; 6.20 GiB reserved in total by PyTorch)

The interesting thing is that every time there is an lack of GPU memory in this same line wh = (rb - lt).clamp(min=0) # [rows, cols, 2]

Bug fix
-

Source

aimhabo

👍1

Most helpful comment

This is because in some situations the number of gt boxes is very large and will cause OOM error. To avoid that, you can try to set gpu_assign_thr as indicated here in your assigner.

ZwwWayne on 28 Oct 2020

🎉1 👍1

All 6 comments

set

    anchor_generator=dict(
            type='AnchorGenerator',
            octave_base_scale=8,
            scales_per_octave=1,
            ratios=[1.0],
            strides=[8, 16, 32, 64, 128]),

can slow the increasing and reduce the rise. After the first epoch, [s]there will be no new peak[/s] still will but only little more. And it can train to the end with effective validate scores.
But why it happen in WIDER_Face, and only in Retina and PISA? In other methods or datasets, it only increase a little( but in here increases 2x+ GPU memory until insufficient).

aimhabo on 2 Oct 2020

👍1

Interesting. I am facing the same issue.

I am training a DetectoRS model with a ResNeXt 101 backbone on a 11 Gb GPU. With only one worker and one image per GPU I run out of memory on exactly the same line you are pointing out. That is, wh = (rb - lt).clamp(min=0) # [B, rows, cols, 2].

The full error report:

.RuntimeErrorTraceback (most recent call last)
<ipython-input-7-c4ae1e89a8af> in <module>
      9 # Create work_dir
     10 # mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
---> 11 train_detector(model, datasets, cfg, distributed=False, validate=True)

/workspace/mmdetection/mmdet/apis/train.py in train_detector(model, dataset, cfg, distributed, validate, timestamp, meta)
    148     elif cfg.load_from:
    149         runner.load_checkpoint(cfg.load_from)
--> 150     runner.run(data_loaders, cfg.workflow, cfg.total_epochs)

/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in run(self, data_loaders, workflow, max_epochs, **kwargs)
    123                     if mode == 'train' and self.epoch >= self._max_epochs:
    124                         break
--> 125                     epoch_runner(data_loaders[i], **kwargs)
    126 
    127         time.sleep(1)  # wait for some hooks like loggers to finish

/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in train(self, data_loader, **kwargs)
     48             self._inner_iter = i
     49             self.call_hook('before_train_iter')
---> 50             self.run_iter(data_batch, train_mode=True)
     51             self.call_hook('after_train_iter')
     52             self._iter += 1

/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in run_iter(self, data_batch, train_mode, **kwargs)
     28         elif train_mode:
     29             outputs = self.model.train_step(data_batch, self.optimizer,
---> 30                                             **kwargs)
     31         else:
     32             outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)

/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py in train_step(self, *inputs, **kwargs)
     65 
     66         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
---> 67         return self.module.train_step(*inputs[0], **kwargs[0])
     68 
     69     def val_step(self, *inputs, **kwargs):

/workspace/mmdetection/mmdet/models/detectors/base.py in train_step(self, data, optimizer)
    232                 averaging the logs.
    233         """
--> 234         losses = self(**data)
    235         loss, log_vars = self._parse_losses(losses)
    236 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py in new_func(*args, **kwargs)
     82                                 'method of nn.Module')
     83             if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled):
---> 84                 return old_func(*args, **kwargs)
     85             # get the arg spec of the decorated method
     86             args_info = getfullargspec(old_func)

/workspace/mmdetection/mmdet/models/detectors/base.py in forward(self, img, img_metas, return_loss, **kwargs)
    166         """
    167         if return_loss:
--> 168             return self.forward_train(img, img_metas, **kwargs)
    169         else:
    170             return self.forward_test(img, img_metas, **kwargs)

/workspace/mmdetection/mmdet/models/detectors/two_stage.py in forward_train(self, img, img_metas, gt_bboxes, gt_labels, gt_bboxes_ignore, gt_masks, proposals, **kwargs)
    154                 gt_labels=None,
    155                 gt_bboxes_ignore=gt_bboxes_ignore,
--> 156                 proposal_cfg=proposal_cfg)
    157             losses.update(rpn_losses)
    158         else:

/workspace/mmdetection/mmdet/models/dense_heads/base_dense_head.py in forward_train(self, x, img_metas, gt_bboxes, gt_labels, gt_bboxes_ignore, proposal_cfg, **kwargs)
     52         else:
     53             loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
---> 54         losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
     55         if proposal_cfg is None:
     56             return losses

/workspace/mmdetection/mmdet/models/dense_heads/rpn_head.py in loss(self, cls_scores, bbox_preds, gt_bboxes, img_metas, gt_bboxes_ignore)
     72             None,
     73             img_metas,
---> 74             gt_bboxes_ignore=gt_bboxes_ignore)
     75         return dict(
     76             loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])

/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py in new_func(*args, **kwargs)
    162                                 'method of nn.Module')
    163             if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled):
--> 164                 return old_func(*args, **kwargs)
    165             # get the arg spec of the decorated method
    166             args_info = getfullargspec(old_func)

/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py in loss(self, cls_scores, bbox_preds, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore)
    458             gt_bboxes_ignore_list=gt_bboxes_ignore,
    459             gt_labels_list=gt_labels,
--> 460             label_channels=label_channels)
    461         if cls_reg_targets is None:
    462             return None

/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py in get_targets(self, anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list, gt_labels_list, label_channels, unmap_outputs, return_sampling_results)
    343             img_metas,
    344             label_channels=label_channels,
--> 345             unmap_outputs=unmap_outputs)
    346         (all_labels, all_label_weights, all_bbox_targets, all_bbox_weights,
    347          pos_inds_list, neg_inds_list, sampling_results_list) = results[:7]

/workspace/mmdetection/mmdet/core/utils/misc.py in multi_apply(func, *args, **kwargs)
     52     pfunc = partial(func, **kwargs) if kwargs else func
     53     map_results = map(pfunc, *args)
---> 54     return tuple(map(list, zip(*map_results)))
     55 
     56 

/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py in _get_targets_single(self, flat_anchors, valid_flags, gt_bboxes, gt_bboxes_ignore, gt_labels, img_meta, label_channels, unmap_outputs)
    218         assign_result = self.assigner.assign(
    219             anchors, gt_bboxes, gt_bboxes_ignore,
--> 220             None if self.sampling else gt_labels)
    221         sampling_result = self.sampler.sample(assign_result, anchors,
    222                                               gt_bboxes)

/workspace/mmdetection/mmdet/core/bbox/assigners/max_iou_assigner.py in assign(self, bboxes, gt_bboxes, gt_bboxes_ignore, gt_labels)
    103                 gt_labels = gt_labels.cpu()
    104 
--> 105         overlaps = self.iou_calculator(gt_bboxes, bboxes)
    106 
    107         if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None

/workspace/mmdetection/mmdet/core/bbox/iou_calculators/iou2d_calculator.py in __call__(self, bboxes1, bboxes2, mode, is_aligned)
     33         if bboxes1.size(-1) == 5:
     34             bboxes1 = bboxes1[..., :4]
---> 35         return bbox_overlaps(bboxes1, bboxes2, mode, is_aligned)
     36 
     37     def __repr__(self):

/workspace/mmdetection/mmdet/core/bbox/iou_calculators/iou2d_calculator.py in bbox_overlaps(bboxes1, bboxes2, mode, is_aligned, eps)
    139                        bboxes2[..., None, :, 2:])  # [B, rows, cols, 2]
    140 
--> 141         wh = (rb - lt).clamp(min=0)  # [B, rows, cols, 2]
    142         overlap = wh[..., 0] * wh[..., 1]
    143 

RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 10.73 GiB total capacity; 9.62 GiB already allocated; 30.62 MiB free; 9.86 GiB reserved in total by PyTorch)

Here is my config file: config.txt

borgarpa on 26 Oct 2020

This is because in some situations the number of gt boxes is very large and will cause OOM error. To avoid that, you can try to set gpu_assign_thr as indicated here in your assigner.

ZwwWayne on 28 Oct 2020

🎉1 👍1

@ZwwWayne It works! Thank you!

aimhabo on 30 Oct 2020

Thanks a lot for the info! @ZwwWayne @aimhabo Could you give me a little tip about what values should I choose in order not to hurt accuracy too much?

borgarpa on 30 Oct 2020

@borgarpa no... Although the difference in the code is only the device. I didn’t find any changes to the calculation.
And when I doubled the batch(use 2 GPU), the score was unexpectedly lower.

aimhabo on 1 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings