Faster-rcnn.pytorch: Training error. bg_num_rois = 0 and fg_num_rois = 0, this should not happen!

Created on 4 Jul 2019 · 3Comments · Source: jwyang/faster-rcnn.pytorch

Hi, I meet some proｂlems when training.
The error message is as follows:

ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!

And I find before the error, the loss has turned to nan, and I followed some suggestions like climp gradient or reduce lr, none of them worked.

[session 1][epoch  1][iter  300/2164] loss: nan, lr: 1.00e-04
            fg/bg=(128/0), time cost: 29.000118

I checked my annotation files, some xmin is 0, I don't know if it is the problem, because I plus xmin to 1, it's not work.
And I print gt_boxes and I found xmin is more than 64041, appａrently it's not right.

gt_boxes is tensor([[[6.4041e+04, 1.7687e+02, 2.2182e+02, 4.3876e+02, 2.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]],

So I think there is somewhere wrong about compute the gt_boxes in your code, but it hard to find out, could you give me a clue about how to fix it?
Thank for your kindly reply!

Source

herrickli

👍1

Most helpful comment

Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.

Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g: openimage.py I recommend you copy the pascal_voc.py and work from there. Delete the -1.

Moreover, change in:
https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122
delete the -1.

There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.

hope it helps! :)

marcunzueta on 4 Aug 2019

👍6 ❤1

All 3 comments

Check #349

DebasmitaGhose on 29 Jul 2019

Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.

Moreover, change in:
https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122
delete the -1.

hope it helps! :)

marcunzueta on 4 Aug 2019

👍6 ❤1

make sure x2 and y2 < width because it will flip image and annotation

        wh = tree.find('size')
        w, h = int(wh.find('width').text), int(wh.find('height').text)
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)
            x1 = max(x1, 0)
            y1 = max(y1, 0)
            x2 = min(x2, w)
            y2 = min(y2, h)