when train my own dataset using Resnet101 backbone after 27k iterations, it always encouters this problem as below:
File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
losses.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: out of memory
btw, the input size is set to be (800, 1333).
It's difficult to say where the problem comes from.
If your dataset might contain a large number of boxes in the same image, then I'd say that your issue might be related to https://github.com/facebookresearch/maskrcnn-benchmark/issues/18, where we propose a few workaround solutions.
Apart from that, without further information it's difficult to say what else could be causing the OOM.
thanks @fmassa
thanks @fmassa
hi @fmassa
when i reduce the IMS_PER_BATCH to 8 for 8 GPUs and use resnet50 as backbone, to train my own dataset, it encounters the problem as below:
File "maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 84, in boxlist_iou
wh = (rb - lt + TO_REMOVE).clamp(min=0) # [N,M,2]
RuntimeError: CUDA error: out of memory
do u have any suggestions to solve this problem?
thanks!
Do you have a large number of boxes per image in your dataset?
If that's the case, then your problem might be related to #18 , and a possible solution is to move IoU computation to the CPU while we don't add custom kernels for box IoU
hi @fmassa
the maximum number of gt boxes in my dataset is 60. i have no idea to deal with it.
This is the maximum number of boxes in a single image?
Can you try making the box iou computation run on the CPU, as I explained just before, and see if you run out of memory?
hi @fmassa
yet, it's in a single image. I have tried what u say, but meet other problems. i will give my results on cpu mode after i fix these problems.
hi @fmassa
i add the code after this line as below:
USE_CPU_MODE = True
if USE_CPU_MODE and N >= 20:
device = box1.device
box1 = box1.cpu() # ground-truths
box2 = box2.cpu() # predictions
lt = torch.max(box1[:, None, :2], box2[:, :2]).cpu() # [N,M,2]
rb = torch.min(box1[:, None, 2:], box2[:, 2:]).cpu() # [N,M,2]
TO_REMOVE = 1
wh = (rb - lt + TO_REMOVE).clamp(min=0).cpu() # [N,M,2]
inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]
iou = inter.cpu() / (area1[:, None].cpu() + area2.cpu() - inter.cpu())
iou = iou.to(device)
return iou
if the number of gt boxes is lager or equal to 20, use cpu to compute IoU, otherwise use gpu mode. beside, i use multi-scales (=(700, 800, 900)), set MAX_SIZE_TRAIN to 1440 and use single image per gpu. finally it works, but the speed slows a lot ( about 16% more time than gpu mode, and the gpu memory of one or two of gpus reaches 9489MiB).
thanks for your help @fmassa
Here is a simplified implementation:
device = box1.device
if USE_CPU_MODE and N >= 20:
box1 = box1.cpu()
box2 = box2.cpu()
...
# as before, no need to cast
# to .cpu() all the time
return iou.to(device)
So, just to see if I understand it properly, now your OOM error is gone, is that right?
This issue will be better fixed once we add a box iou implementation which is entirely in cuda. This will save a lot of memory I think.
hi @fmassa
it is still OOM as below:
Traceback (most recent call last):
File "tools/train_net.py", line 170, in <module>
main()
File "tools/train_net.py", line 163, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
loss_dict = model(images, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/deprecated/distributed.py", line 222, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 100, in forward
return self._forward_train(anchors, objectness, rpn_box_regression, targets)
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 119, in _forward_train
anchors, objectness, rpn_box_regression, targets
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 91, in __call__
labels, regression_targets = self.prepare_targets(anchors, targets)
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 55, in prepare_targets
anchors_per_image, targets_per_image
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 38, in match_targets_to_anchors
matched_idxs = self.proposal_matcher(match_quality_matrix)
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 85, in __call__
self.set_low_quality_matches_(matches, all_matches, match_quality_matrix)
File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 101, in set_low_quality_matches_
match_quality_matrix == highest_quality_foreach_gt[:, None]
RuntimeError: CUDA error: out of memory
hi @fmassa
if i reduce the input size, it can solve th OOM. but another problem is that if i use GTX TiTan instead of 1080 Ti, the training procedure will be held on and get stuck. it is weird.
About the OOM, it might be due to many reasons, and I might need more information on the particularities of your dataset to be able to help you more.
About the hang, are you still using the same machine or different machines?
If you are using different machines, maybe your nvidia drivers are not up-to-date and you are facing deadlocks similarly to https://github.com/facebookresearch/maskrcnn-benchmark/issues/58 ?
hi @fmassa
my own dataset has 17 categories, and the maximum number of gt boxes in one image is 26. the image is not to large, the max size of these images is less then 1200. btw, my driver version is as below:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.130 Wed Mar 21 03:37:26 PDT 2018
thanks
hi @fmassa
i update the driver from 384 to 390, the training procedure still hangs, and i use cuda8.0.61 and GTX Titan (12G) card. by the way, i use cpu to compute the IoU, the memory looks a litte strange, as below:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 13630 C /usr/bin/python3.6 5411MiB |
| 1 13631 C /usr/bin/python3.6 5325MiB |
| 2 13632 C /usr/bin/python3.6 5009MiB |
| 3 13633 C /usr/bin/python3.6 4339MiB |
| 4 13634 C /usr/bin/python3.6 5097MiB |
| 5 13635 C /usr/bin/python3.6 4873MiB |
| 6 13637 C /usr/bin/python3.6 11099MiB |
| 7 13638 C /usr/bin/python3.6 4231MiB |
+-----------------------------------------------------------------------------+
OOM as below:
Tried to allocate 7.09 GiB (GPU 6; 10.92 GiB total capacity; 3.73 GiB already allocated; 6.09 GiB free; 50.97 MiB cached)
I think there might be some incompatibilities with your driver and your CUDA version.
So, by checking your previous driver version (384.130), you can see from here that it was before the bugfix, and thus the hang.
Can you update to CUDA 9.2 and install driver >=396.26 ? This will definitely fix your problems.
thanks @fmassa .
after update ubuntu 14.04 to 16.04, i will try what u suggest, and then report my results here.
thanks again.
hi @fmassa,
The OOM problem has been solved. beacuse i duplicated the ground-truths several times, making the number of gt bboxes to be 2k. (very sorry for that). btw, if using cpu to compute the IoUs for prediction and gt, not only need to modify these lines, but also need to pay attention to the few lines: so that it can deal with a large amout of gt bboxes at cost of slowing the training speed (maybe training time is doubled).
about the hanging, since i upgrade ubuntu 14.04 to 16.04, install cuda 9.0 (or cuda 9.2) with difference nvidia-drivers (390, 396, 410), it sometime happens. as @chengyangfu said, when use nvidia-driver 410, the frequency is much lower.
thanks!
Cool, great that it's working now.
About the modifications, I'd say that you could move the data back to GPU in the end of boxlist if you have enough memory to hold it.
Let us know if you have further questions.
I also have the same problem. I noticed that when N > 200 (maybe smaller than 200) will show this error. I didn't change the calculation to cpu. I just use torch.cuda.empty_cache() fro each batch, which seems okay for my situation.
hi @fmassa
it is still OOM as below:Traceback (most recent call last): File "tools/train_net.py", line 170, in <module> main() File "tools/train_net.py", line 163, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 73, in train arguments, File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train loss_dict = model(images, targets) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/deprecated/distributed.py", line 222, in forward return self.module(*inputs[0], **kwargs[0]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward proposals, proposal_losses = self.rpn(images, features, targets) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 100, in forward return self._forward_train(anchors, objectness, rpn_box_regression, targets) File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 119, in _forward_train anchors, objectness, rpn_box_regression, targets File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 91, in __call__ labels, regression_targets = self.prepare_targets(anchors, targets) File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 55, in prepare_targets anchors_per_image, targets_per_image File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/loss.py", line 38, in match_targets_to_anchors matched_idxs = self.proposal_matcher(match_quality_matrix) File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 85, in __call__ self.set_low_quality_matches_(matches, all_matches, match_quality_matrix) File "maskrcnn-benchmark/maskrcnn_benchmark/modeling/matcher.py", line 101, in set_low_quality_matches_ match_quality_matrix == highest_quality_foreach_gt[:, None] RuntimeError: CUDA error: out of memory
hi @zimenglan-sysu-512 , I tried making the box iou computation run on the CPU as you do:
USE_CPU_MODE = True
if USE_CPU_MODE and N >= 20:
device = box1.device
...
iou = iou.to(device)
return iou
but I meet the same errors as above. how did you solve that? Is it necessary to modify something in /maskrcnn_benchmark/modeling/mather.py?
I also have the same problem. I noticed that when N > 200 (maybe smaller than 200) will show this error. I didn't change the calculation to cpu. I just use
torch.cuda.empty_cache()fro each batch, which seems okay for my situation.
hi @yuchenrao-bg Could you please tell me where you add torch.cuda.empty_cache()? in which file? I met the same problems
I also have the same problem. I noticed that when N > 200 (maybe smaller than 200) will show this error. I didn't change the calculation to cpu. I just use
torch.cuda.empty_cache()fro each batch, which seems okay for my situation.hi @yuchenrao-bg Could you please tell me where you add
torch.cuda.empty_cache()? in which file? I met the same problems
Sorry for late reply. I don't remember it clearly but I think you can add it in the training code.