Faster-rcnn.pytorch: Training error. bg_num_rois = 0 and fg_num_rois = 0, this should not happen!

Created on 4 Jul 2019  路  3Comments  路  Source: jwyang/faster-rcnn.pytorch

Hi, I meet some pro锝俵ems when training.
The error message is as follows:

ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!

And I find before the error, the loss has turned to nan, and I followed some suggestions like climp gradient or reduce lr, none of them worked.

[session 1][epoch  1][iter  300/2164] loss: nan, lr: 1.00e-04
            fg/bg=(128/0), time cost: 29.000118

I checked my annotation files, some xmin is 0, I don't know if it is the problem, because I plus xmin to 1, it's not work.
And I print gt_boxes and I found xmin is more than 64041, app锝乺ently it's not right.

gt_boxes is tensor([[[6.4041e+04, 1.7687e+02, 2.2182e+02, 4.3876e+02, 2.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]],

So I think there is somewhere wrong about compute the gt_boxes in your code, but it hard to find out, could you give me a clue about how to fix it?
Thank for your kindly reply!

Most helpful comment

Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.

Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g: openimage.py I recommend you copy the pascal_voc.py and work from there. Delete the -1.

Moreover, change in:
https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122
delete the -1.

There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.

hope it helps! :)

All 3 comments

Check #349

Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.

Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g: openimage.py I recommend you copy the pascal_voc.py and work from there. Delete the -1.

Moreover, change in:
https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122
delete the -1.

There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.

hope it helps! :)

make sure x2 and y2 < width because it will flip image and annotation

        wh = tree.find('size')
        w, h = int(wh.find('width').text), int(wh.find('height').text)
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)
            x1 = max(x1, 0)
            y1 = max(y1, 0)
            x2 = min(x2, w)
            y2 = min(y2, h)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

Feiyu-Zhang picture Feiyu-Zhang  路  5Comments

wanghan0501 picture wanghan0501  路  4Comments

HaiminZhang picture HaiminZhang  路  3Comments

clavichord93 picture clavichord93  路  6Comments

Gr1tee picture Gr1tee  路  4Comments