When I train the model I got 'nan' loss in the first epoch. Does anyone know what is the problem? Thanks a lot!
[session 1][epoch 1][iter 600/ 967] loss: nan, lr: 1.00e-03
fg/bg=(512/0), time cost: 40.919098
rpn_cls: nan, rpn_box: nan, rcnn_cls: 2.5435, rcnn_box 0.0000
There are several issues that describe ways to address this.
It can be dependent on a few things, e.g. dataset labels, exploding gradients, etc.
What worked for me was to clip the gradients of the model during training:
clip_gradient(fasterRCNN, 10.)
In the standard train_val.py document this is already set when using a VGG16 backend here
I solved it by change the code in pascal_voc.py:
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
The '-1' operation caused this problem.
Thanks for the patience of AlexanderHustinx!
Do you use labelimg to produce ur datasets? so the dim of axis is from 0 not 1
Most helpful comment
I solved it by change the code in
pascal_voc.py:x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
The '-1' operation caused this problem.
Thanks for the patience of AlexanderHustinx!