after thousands iterations, faster-rcnn throw a error "Floating point exception " at ./experiments/scripts/faster_rcnn_end2end.sh . I search the error saying about i/0 or i%0, anyone encountered this?
I encountered a similar problem.
Solving...
I0428 15:05:27.513572 6443 solver.cpp:242] Iteration 0, loss = 4.65389
I0428 15:05:27.513619 6443 solver.cpp:258] Train net output #0: loss_bbox = 0.190101 (* 1 = 0.190101 loss)
I0428 15:05:27.513628 6443 solver.cpp:258] Train net output #1: loss_cls = 3.44897 (* 1 = 3.44897 loss)
I0428 15:05:27.513635 6443 solver.cpp:258] Train net output #2: rpn_cls_loss = 0.900724 (* 1 = 0.900724 loss)
I0428 15:05:27.513643 6443 solver.cpp:258] Train net output #3: rpn_loss_bbox = 0.119607 (* 1 = 0.119607 loss)
I0428 15:05:27.513656 6443 solver.cpp:571] Iteration 0, lr = 0.001
./experiments/scripts/faster_rcnn_end2end.sh: line 57: 6443 Floating point exception(core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/${NET}.v2.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml ${EXTRA_ARGS}
I got the same error and it turned out that I was feeding in empty boxes array. Filtering out roidb properly fixed my problem.
what does "filtering out roidb properly" mean?Would you please give us more details?
I've got the same error. By changing the RNG_SEED default value I get error in different iterations. Have you guys found the solution yet? @weichengkuo , I would be thankful if you please elaborate a little bit more. Where should I filter the empty boxes? Thanks!
take a look #65
It's possible that some layer of your faster RCNN receive no boxes at some iteration. I ran into this error multiple times and it's often due to empty boxes. Filtering roidb means to remove the roidb elements that could cause this problem.
how to solve, please?
pad 0 the original image to reasonable aspect ratio (600*1000) will solve this problem.
@morusu So where do we need to modify to 'pad 0s the original image' ?
How to fix the code to do 'pad 0 the original image', or still need to preprocess the images first?.
Can you give us an example? Thanks
@buaaliyi @LiberiFatali preprocess the images first, pad 0 to images' right-side or down-side to reasonable aspect ratio will be fine.
I got this error while using old code. This problem is solved for me by applying
def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""
in https://github.com/rbgirshick/py-faster-rcnn/blob/d66cc2bff142ca07f521db06ca3e9e10dbc8df20/lib/fast_rcnn/train.py
@LiberiFatali Thanks, your solution solved my problem!
@vra Where did you apply the filter_roidb function? It is already called in train_net() function (fast_rcnn/train.py). I am facing the same problem as @morusu described. Suddenly my loss goes to nan (overflow encountered in exp). I am using PascalVoc dataset and have no clue about the problem. Anyone solved this issue? Thank you!
Hi @fernandorovai ,
Sorry I should make it more clearly. I am using RstarCNN, which uses rgb's fast_rcnn reop in it. In fast-rcnn, there is no filter_roidb function. When I added this function in it, my problem solved.
Did you try to descend your learning rate? As far as I known, the nan problem is always related to a large learning rate.
@vra Hello, does it go well when you add the filter_roidb to train.py? In my case, there is the function of filter_roidb, but I have the problem of 'floating point exception'. I tried to change the learning rate and the RNG_SEED, but it does not go well.
@HyunJun-Jo hello,I have the same problem,too.I tried to change the learning rate and the RNG_SEED,but it does not go well,too.Have you solved the problem? thx
@morusu @wait1988 @weichengkuo @smasoudn @smichalowski
hi,when I train FPN on my own dataset,I met error:
I0312 16:25:25.883342 2983 sgd_solver.cpp:106] Iteration 0, lr = 0.0005
/home/zq/FPN/tools/../lib/rpn/proposal_layer.py:175: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Floating point exception (core dumped)
I try to change lr from 0.001 to 0.0001,but it didn't work.I also change RNG_SEED,and it also didn't work.
I don't know how to solve it.please help me,thanks so much!
Have anyone solved the problem? I get the same error at iteration 5800 while using the learning rate at 0.001 and at iteration 18800 while using 0.0001..If someone have solved the problem, please help me to solve it.
I have solved my 'Floating point exception (core dumped)' problem by modifying the function 'is_valid' in function 'filter_roidb' in file da-faster-rcnn-master/lib/fast_rcnn/train.py:
def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""
def is_valid(entry):
# Valid images have:
# (1) At least one foreground RoI OR
# (2) At least one background RoI
overlaps = entry['max_overlaps']
# added to handle empty boxes, see https://github.com/rbgirshick/py-faster-rcnn/issues/159
not_empty = np.zeros(len(entry['max_overlaps']), dtype=bool)
cur_boxes = entry['boxes']
for i in range(len(not_empty)):
if (cur_boxes[i][2] - cur_boxes[i][0] > 1 and cur_boxes[i][3] - cur_boxes[i][1] > 1):
not_empty[i] = True
# find boxes with sufficient overlap
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO) & not_empty)[0]
# image is only valid if such boxes exist
valid = len(fg_inds) > 0 or len(bg_inds) > 0
return valid
Most helpful comment
take a look #65