Describe the bug
While training WIDER_Face( in coco label format) with retinanet_r50_fpn_1x, pisa_r50_fpn_1x, the GPU memory usage increasing in each iteration. And causing GPU memory leaky in the first epoch.
It can train on fsaf_r50_fpn_1x and performance 0.000 scores. But it can normal work with atss_r50_fpn_1x and performance normal, so the dataset is well.
Reproduction
CUDA_VISIBLE_DEVICES=1 /home/wit/anaconda3/envs/mmlab/bin/python \
./tools/train.py \
/home/wit/wjx/mmdetection-20200921/configs/widerface/retina_res_widerface.py \
--work-dir /home/wit/wjx/WiderFace/training \
--no-validate ")
Did you make any modifications on the code or config? Did you understand what you have modified?
bbox_head=dict(class=1), data = dict(samples_per_gpu=1,workers_per_gpu=1)
What dataset did you use?
WIDER_Face
Environment
python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1: GeForce RTX 2080
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.168
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.1
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMDetection: 2.4.0+73ebc52
You may add addition that may be helpful for locating the problem, such as
How you installed PyTorch [e.g., pip, conda, source]
conda install pytorch==1.6.0 torchvision -c pytorch
Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
$PATH, bash: /home/wit/anaconda3/bin:/home/wit/anaconda3/condabin:/usr/local/cuda/bin:/home/wit/bin:/home/wit/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin: No such file or directory
$LD_LIBRARY_PATH, bash: /usr/local/cuda/lib64: Is a directory
$PYTHONPATH, [EMPTY]
Error traceback
If applicable, paste the error trackback here.
2020-09-30 15:35:48,266 - mmdet - INFO - Epoch [1][11050/12876] lr: 1.000e-02, eta: 4:14:35, time: 0.106, data_time: 0.002, memory: 5602, loss_cls: 0.8433, loss_bbox: 0.8669, loss: 1.7102, grad_norm: 0.7107
Traceback (most recent call last):
File "./tools/train.py", line 179, in <module>
main()
File "./tools/train.py", line 175, in main
meta=meta)
File "/home/wit/wjx/mmdetection-20200921/mmdet/apis/train.py", line 143, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in train
**kwargs)
File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/detectors/base.py", line 234, in train_step
losses = self(**data)
File "/home/wit/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wit/wjx/mmdetection-20200921/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/detectors/base.py", line 168, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/detectors/single_stage.py", line 117, in forward_train
gt_labels, gt_bboxes_ignore)
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/wit/wjx/mmdetection-20200921/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/anchor_head.py", line 470, in loss
label_channels=label_channels)
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/anchor_head.py", line 355, in get_targets
unmap_outputs=unmap_outputs)
File "/home/wit/wjx/mmdetection-20200921/mmdet/core/utils/misc.py", line 54, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/wit/wjx/mmdetection-20200921/mmdet/models/dense_heads/anchor_head.py", line 229, in _get_targets_single
None if self.sampling else gt_labels)
File "/home/wit/wjx/mmdetection-20200921/mmdet/core/bbox/assigners/max_iou_assigner.py", line 105, in assign
overlaps = self.iou_calculator(gt_bboxes, bboxes)
File "/home/wit/wjx/mmdetection-20200921/mmdet/core/bbox/iou_calculators/iou2d_calculator.py", line 31, in __call__
return bbox_overlaps(bboxes1, bboxes2, mode, is_aligned)
File "/home/wit/wjx/mmdetection-20200921/mmdet/core/bbox/iou_calculators/iou2d_calculator.py", line 114, in bbox_overlaps
wh = (rb - lt).clamp(min=0) # [rows, cols, 2]
RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 7.77 GiB total capacity; 6.11 GiB already allocated; 797.50 MiB free; 6.20 GiB reserved in total by PyTorch)
The interesting thing is that every time there is an lack of GPU memory in this same line wh = (rb - lt).clamp(min=0) # [rows, cols, 2]
Bug fix
-
set
anchor_generator=dict(
type='AnchorGenerator',
octave_base_scale=8,
scales_per_octave=1,
ratios=[1.0],
strides=[8, 16, 32, 64, 128]),
can slow the increasing and reduce the rise. After the first epoch, [s]there will be no new peak[/s] still will but only little more. And it can train to the end with effective validate scores.
But why it happen in WIDER_Face, and only in Retina and PISA? In other methods or datasets, it only increase a little( but in here increases 2x+ GPU memory until insufficient).
Interesting. I am facing the same issue.
I am training a DetectoRS model with a ResNeXt 101 backbone on a 11 Gb GPU. With only one worker and one image per GPU I run out of memory on exactly the same line you are pointing out. That is, wh = (rb - lt).clamp(min=0) # [B, rows, cols, 2].
The full error report:
.RuntimeErrorTraceback (most recent call last)
<ipython-input-7-c4ae1e89a8af> in <module>
9 # Create work_dir
10 # mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
---> 11 train_detector(model, datasets, cfg, distributed=False, validate=True)
/workspace/mmdetection/mmdet/apis/train.py in train_detector(model, dataset, cfg, distributed, validate, timestamp, meta)
148 elif cfg.load_from:
149 runner.load_checkpoint(cfg.load_from)
--> 150 runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in run(self, data_loaders, workflow, max_epochs, **kwargs)
123 if mode == 'train' and self.epoch >= self._max_epochs:
124 break
--> 125 epoch_runner(data_loaders[i], **kwargs)
126
127 time.sleep(1) # wait for some hooks like loggers to finish
/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in train(self, data_loader, **kwargs)
48 self._inner_iter = i
49 self.call_hook('before_train_iter')
---> 50 self.run_iter(data_batch, train_mode=True)
51 self.call_hook('after_train_iter')
52 self._iter += 1
/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in run_iter(self, data_batch, train_mode, **kwargs)
28 elif train_mode:
29 outputs = self.model.train_step(data_batch, self.optimizer,
---> 30 **kwargs)
31 else:
32 outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)
/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py in train_step(self, *inputs, **kwargs)
65
66 inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
---> 67 return self.module.train_step(*inputs[0], **kwargs[0])
68
69 def val_step(self, *inputs, **kwargs):
/workspace/mmdetection/mmdet/models/detectors/base.py in train_step(self, data, optimizer)
232 averaging the logs.
233 """
--> 234 losses = self(**data)
235 loss, log_vars = self._parse_losses(losses)
236
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py in new_func(*args, **kwargs)
82 'method of nn.Module')
83 if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled):
---> 84 return old_func(*args, **kwargs)
85 # get the arg spec of the decorated method
86 args_info = getfullargspec(old_func)
/workspace/mmdetection/mmdet/models/detectors/base.py in forward(self, img, img_metas, return_loss, **kwargs)
166 """
167 if return_loss:
--> 168 return self.forward_train(img, img_metas, **kwargs)
169 else:
170 return self.forward_test(img, img_metas, **kwargs)
/workspace/mmdetection/mmdet/models/detectors/two_stage.py in forward_train(self, img, img_metas, gt_bboxes, gt_labels, gt_bboxes_ignore, gt_masks, proposals, **kwargs)
154 gt_labels=None,
155 gt_bboxes_ignore=gt_bboxes_ignore,
--> 156 proposal_cfg=proposal_cfg)
157 losses.update(rpn_losses)
158 else:
/workspace/mmdetection/mmdet/models/dense_heads/base_dense_head.py in forward_train(self, x, img_metas, gt_bboxes, gt_labels, gt_bboxes_ignore, proposal_cfg, **kwargs)
52 else:
53 loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
---> 54 losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
55 if proposal_cfg is None:
56 return losses
/workspace/mmdetection/mmdet/models/dense_heads/rpn_head.py in loss(self, cls_scores, bbox_preds, gt_bboxes, img_metas, gt_bboxes_ignore)
72 None,
73 img_metas,
---> 74 gt_bboxes_ignore=gt_bboxes_ignore)
75 return dict(
76 loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])
/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py in new_func(*args, **kwargs)
162 'method of nn.Module')
163 if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled):
--> 164 return old_func(*args, **kwargs)
165 # get the arg spec of the decorated method
166 args_info = getfullargspec(old_func)
/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py in loss(self, cls_scores, bbox_preds, gt_bboxes, gt_labels, img_metas, gt_bboxes_ignore)
458 gt_bboxes_ignore_list=gt_bboxes_ignore,
459 gt_labels_list=gt_labels,
--> 460 label_channels=label_channels)
461 if cls_reg_targets is None:
462 return None
/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py in get_targets(self, anchor_list, valid_flag_list, gt_bboxes_list, img_metas, gt_bboxes_ignore_list, gt_labels_list, label_channels, unmap_outputs, return_sampling_results)
343 img_metas,
344 label_channels=label_channels,
--> 345 unmap_outputs=unmap_outputs)
346 (all_labels, all_label_weights, all_bbox_targets, all_bbox_weights,
347 pos_inds_list, neg_inds_list, sampling_results_list) = results[:7]
/workspace/mmdetection/mmdet/core/utils/misc.py in multi_apply(func, *args, **kwargs)
52 pfunc = partial(func, **kwargs) if kwargs else func
53 map_results = map(pfunc, *args)
---> 54 return tuple(map(list, zip(*map_results)))
55
56
/workspace/mmdetection/mmdet/models/dense_heads/anchor_head.py in _get_targets_single(self, flat_anchors, valid_flags, gt_bboxes, gt_bboxes_ignore, gt_labels, img_meta, label_channels, unmap_outputs)
218 assign_result = self.assigner.assign(
219 anchors, gt_bboxes, gt_bboxes_ignore,
--> 220 None if self.sampling else gt_labels)
221 sampling_result = self.sampler.sample(assign_result, anchors,
222 gt_bboxes)
/workspace/mmdetection/mmdet/core/bbox/assigners/max_iou_assigner.py in assign(self, bboxes, gt_bboxes, gt_bboxes_ignore, gt_labels)
103 gt_labels = gt_labels.cpu()
104
--> 105 overlaps = self.iou_calculator(gt_bboxes, bboxes)
106
107 if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
/workspace/mmdetection/mmdet/core/bbox/iou_calculators/iou2d_calculator.py in __call__(self, bboxes1, bboxes2, mode, is_aligned)
33 if bboxes1.size(-1) == 5:
34 bboxes1 = bboxes1[..., :4]
---> 35 return bbox_overlaps(bboxes1, bboxes2, mode, is_aligned)
36
37 def __repr__(self):
/workspace/mmdetection/mmdet/core/bbox/iou_calculators/iou2d_calculator.py in bbox_overlaps(bboxes1, bboxes2, mode, is_aligned, eps)
139 bboxes2[..., None, :, 2:]) # [B, rows, cols, 2]
140
--> 141 wh = (rb - lt).clamp(min=0) # [B, rows, cols, 2]
142 overlap = wh[..., 0] * wh[..., 1]
143
RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 10.73 GiB total capacity; 9.62 GiB already allocated; 30.62 MiB free; 9.86 GiB reserved in total by PyTorch)
Here is my config file: config.txt
This is because in some situations the number of gt boxes is very large and will cause OOM error. To avoid that, you can try to set gpu_assign_thr as indicated here in your assigner.
@ZwwWayne It works! Thank you!
Thanks a lot for the info! @ZwwWayne @aimhabo Could you give me a little tip about what values should I choose in order not to hurt accuracy too much?
@borgarpa no... Although the difference in the code is only the device. I didn鈥檛 find any changes to the calculation.
And when I doubled the batch(use 2 GPU), the score was unexpectedly lower.
Most helpful comment
This is because in some situations the number of gt boxes is very large and will cause OOM error. To avoid that, you can try to set
gpu_assign_thras indicated here in your assigner.