Hi, I meet some pro锝俵ems when training.
The error message is as follows:
ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!
And I find before the error, the loss has turned to nan, and I followed some suggestions like climp gradient or reduce lr, none of them worked.
[session 1][epoch 1][iter 300/2164] loss: nan, lr: 1.00e-04
fg/bg=(128/0), time cost: 29.000118
I checked my annotation files, some xmin is 0, I don't know if it is the problem, because I plus xmin to 1, it's not work.
And I print gt_boxes and I found xmin is more than 64041, app锝乺ently it's not right.
gt_boxes is tensor([[[6.4041e+04, 1.7687e+02, 2.2182e+02, 4.3876e+02, 2.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]],
So I think there is somewhere wrong about compute the gt_boxes in your code, but it hard to find out, could you give me a clue about how to fix it?
Thank for your kindly reply!
Check #349
Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.
Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g: openimage.py I recommend you copy the pascal_voc.py and work from there. Delete the -1.
Moreover, change in:
https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122
delete the -1.
There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.
hope it helps! :)
make sure x2 and y2 < width because it will flip image and annotation
wh = tree.find('size')
w, h = int(wh.find('width').text), int(wh.find('height').text)
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
x1 = max(x1, 0)
y1 = max(y1, 0)
x2 = min(x2, w)
y2 = min(y2, h)
Most helpful comment
Hi, I found the same bug while trying to create my own data with the images from OpenImage for the Kaggle competition.
Check in: https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/pascal_voc.py#L234-L237 of your new generated dataset .py file e.g:
openimage.pyI recommend you copy thepascal_voc.pyand work from there. Delete the -1.Moreover, change in:
https://github.com/jwyang/faster-rcnn.pytorch/blob/358cecacf876717ff13988dc6396de10e265279c/lib/datasets/imdb.py#L121-L122
delete the -1.
There are objects where the bbox are 0,1,0, for example, which makes either the code the crash due to an assertion error or the loss to become nan. If you are using a dataset with some bbox annotations that are either 0 or equal to the image width, apply the changes.
hope it helps! :)