Faster-rcnn.pytorch: nan loss in the first epoch

Created on 14 May 2019  路  4Comments  路  Source: jwyang/faster-rcnn.pytorch

When I train the model I got 'nan' loss in the first epoch. Does anyone know what is the problem? Thanks a lot!

Most helpful comment

I solved it by change the code in pascal_voc.py:
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
The '-1' operation caused this problem.
Thanks for the patience of AlexanderHustinx!

All 4 comments

[session 1][epoch 1][iter 600/ 967] loss: nan, lr: 1.00e-03
fg/bg=(512/0), time cost: 40.919098
rpn_cls: nan, rpn_box: nan, rcnn_cls: 2.5435, rcnn_box 0.0000

There are several issues that describe ways to address this.
It can be dependent on a few things, e.g. dataset labels, exploding gradients, etc.

What worked for me was to clip the gradients of the model during training:
clip_gradient(fasterRCNN, 10.)

In the standard train_val.py document this is already set when using a VGG16 backend here

I solved it by change the code in pascal_voc.py:
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
The '-1' operation caused this problem.
Thanks for the patience of AlexanderHustinx!

Do you use labelimg to produce ur datasets? so the dim of axis is from 0 not 1

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Feiyu-Zhang picture Feiyu-Zhang  路  5Comments

herrickli picture herrickli  路  3Comments

Wanggcong picture Wanggcong  路  5Comments

Codermay picture Codermay  路  5Comments

ZhangJiajun1995 picture ZhangJiajun1995  路  5Comments