Py-faster-rcnn: Floating point exception

Created on 26 Apr 2016 · 20Comments · Source: rbgirshick/py-faster-rcnn

after thousands iterations, faster-rcnn throw a error "Floating point exception " at ./experiments/scripts/faster_rcnn_end2end.sh . I search the error saying about i/0 or i%0, anyone encountered this?

Source

morusu

Most helpful comment

take a look #65

smichalowski on 9 May 2016

👍3

All 20 comments

I encountered a similar problem.

Solving...
I0428 15:05:27.513572 6443 solver.cpp:242] Iteration 0, loss = 4.65389
I0428 15:05:27.513619 6443 solver.cpp:258] Train net output #0: loss_bbox = 0.190101 (* 1 = 0.190101 loss)
I0428 15:05:27.513628 6443 solver.cpp:258] Train net output #1: loss_cls = 3.44897 (* 1 = 3.44897 loss)
I0428 15:05:27.513635 6443 solver.cpp:258] Train net output #2: rpn_cls_loss = 0.900724 (* 1 = 0.900724 loss)
I0428 15:05:27.513643 6443 solver.cpp:258] Train net output #3: rpn_loss_bbox = 0.119607 (* 1 = 0.119607 loss)
I0428 15:05:27.513656 6443 solver.cpp:571] Iteration 0, lr = 0.001
./experiments/scripts/faster_rcnn_end2end.sh: line 57: 6443 Floating point exception(core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/${NET}.v2.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml ${EXTRA_ARGS}

wait1988 on 28 Apr 2016

I got the same error and it turned out that I was feeding in empty boxes array. Filtering out roidb properly fixed my problem.

weichengkuo on 4 May 2016

what does "filtering out roidb properly" mean?Would you please give us more details?

wait1988 on 5 May 2016

I've got the same error. By changing the RNG_SEED default value I get error in different iterations. Have you guys found the solution yet? @weichengkuo , I would be thankful if you please elaborate a little bit more. Where should I filter the empty boxes? Thanks!

smasoudn on 9 May 2016

take a look #65

smichalowski on 9 May 2016

👍3

It's possible that some layer of your faster RCNN receive no boxes at some iteration. I ran into this error multiple times and it's often due to empty boxes. Filtering roidb means to remove the roidb elements that could cause this problem.

weichengkuo on 11 May 2016

how to solve, please?

daf11865 on 12 May 2016

pad 0 the original image to reasonable aspect ratio (600*1000) will solve this problem.

morusu on 2 Jun 2016

@morusu So where do we need to modify to 'pad 0s the original image' ?

LiberiFatali on 9 Jun 2016

How to fix the code to do 'pad 0 the original image', or still need to preprocess the images first?.
Can you give us an example? Thanks

buaaliyi on 20 Jun 2016

@buaaliyi @LiberiFatali preprocess the images first, pad 0 to images' right-side or down-side to reasonable aspect ratio will be fine.

morusu on 22 Jun 2016

I got this error while using old code. This problem is solved for me by applying

def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""

in https://github.com/rbgirshick/py-faster-rcnn/blob/d66cc2bff142ca07f521db06ca3e9e10dbc8df20/lib/fast_rcnn/train.py

LiberiFatali on 25 Jul 2016

😕3 👍1

@LiberiFatali Thanks, your solution solved my problem!

vra on 18 Nov 2016

@vra Where did you apply the filter_roidb function? It is already called in train_net() function (fast_rcnn/train.py). I am facing the same problem as @morusu described. Suddenly my loss goes to nan (overflow encountered in exp). I am using PascalVoc dataset and have no clue about the problem. Anyone solved this issue? Thank you!

fernandorovai on 14 Dec 2016

Hi @fernandorovai ,
Sorry I should make it more clearly. I am using RstarCNN, which uses rgb's fast_rcnn reop in it. In fast-rcnn, there is no filter_roidb function. When I added this function in it, my problem solved.
Did you try to descend your learning rate? As far as I known, the nan problem is always related to a large learning rate.

vra on 15 Dec 2016

@vra Hello, does it go well when you add the filter_roidb to train.py? In my case, there is the function of filter_roidb, but I have the problem of 'floating point exception'. I tried to change the learning rate and the RNG_SEED, but it does not go well.

June-Jo on 11 Jan 2017

@HyunJun-Jo hello,I have the same problem,too.I tried to change the learning rate and the RNG_SEED,but it does not go well,too.Have you solved the problem? thx

hanjf12 on 11 Apr 2017

@morusu @wait1988 @weichengkuo @smasoudn @smichalowski
hi,when I train FPN on my own dataset,I met error:

I0312 16:25:25.883342 2983 sgd_solver.cpp:106] Iteration 0, lr = 0.0005
/home/zq/FPN/tools/../lib/rpn/proposal_layer.py:175: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Floating point exception (core dumped)

I try to change lr from 0.001 to 0.0001,but it didn't work.I also change RNG_SEED,and it also didn't work.
I don't know how to solve it.please help me,thanks so much!

zqdeepbluesky on 12 Mar 2018

Have anyone solved the problem? I get the same error at iteration 5800 while using the learning rate at 0.001 and at iteration 18800 while using 0.0001..If someone have solved the problem, please help me to solve it.

amlandas78 on 10 May 2018

I have solved my 'Floating point exception (core dumped)' problem by modifying the function 'is_valid' in function 'filter_roidb' in file da-faster-rcnn-master/lib/fast_rcnn/train.py:

def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""

def is_valid(entry):
    # Valid images have:
    #   (1) At least one foreground RoI OR
    #   (2) At least one background RoI
    overlaps = entry['max_overlaps']
    # added to handle empty boxes, see https://github.com/rbgirshick/py-faster-rcnn/issues/159
    not_empty = np.zeros(len(entry['max_overlaps']), dtype=bool)
    cur_boxes = entry['boxes']
    for i in range(len(not_empty)):
        if (cur_boxes[i][2] - cur_boxes[i][0] > 1 and cur_boxes[i][3] - cur_boxes[i][1] > 1):
            not_empty[i] = True

    # find boxes with sufficient overlap
    fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
    # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
    bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
                       (overlaps >= cfg.TRAIN.BG_THRESH_LO) & not_empty)[0]

    # image is only valid if such boxes exist
    valid = len(fg_inds) > 0 or len(bg_inds) > 0

    return valid