I was used author's offered datasets, with the pretrained model vgg16_caffe.pth. When I try to train the model . It's show loss:nan rcnn_cls: nan, rcnn_box nan. I've tried make
x1 = float(bbox.find('xmin').text) #- 1
y1 = float(bbox.find('ymin').text) #- 1
x2 = float(bbox.find('xmax').text) #- 1
y2 = float(bbox.find('ymax').text) #- 1
however it's doesn't works.
before filtering, there are 10022 images...
after filtering, there are 10022 images...
10022 roidb entries
Loading pretrained weights from data/pretrained_model/vgg16_caffe.pth
[session 1][epoch 1][iter 0/1670] loss: nan, lr: 1.00e-06
fg/bg=(157/1379), time cost: 1.932488
rpn_cls: 0.7493, rpn_box: 0.2117, rcnn_cls: nan, rcnn_box nan
[session 1][epoch 1][iter 100/1670] loss: nan, lr: 1.00e-06
fg/bg=(1536/0), time cost: 193.700977
rpn_cls: nan, rpn_box: nan, rcnn_cls: nan, rcnn_box nan
^CTraceback (most recent call last):
File "trainval_net.py", line 330, in
VOC data set has bounding box information present as 1-indexed. Therefore it is necessary to include -1 that you have commented out. It might be the cause for the incorrect loss computation resulting in nan.
I met the same problem and still don't know how to solve it.
I used res101 net to instead and it works.
-------- 原始邮件 --------
主题:Re: [jwyang/faster-rcnn.pytorch] Issue :loss: nan (#416)
发件人:Emma
收件人:"jwyang/faster-rcnn.pytorch"
抄送:JanVin ,Author
I met the same problem and still don't know how to solve it.
―
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com/jwyang/faster-rcnn.pytorch/issues/416?email_source=notifications&email_token=AK5PKWI65ZRQP3R7OWAOPWLPVLJ4NA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVL2LDQ#issuecomment-492283278, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AK5PKWN4UC6SZBYMTI2YNODPVLJ4NANCNFSM4GOJWE3A.
I just use the res101 can got the same problem again.
Which branch are you used?
-------- 原始邮件 --------
主题:Re: [jwyang/faster-rcnn.pytorch] Issue :loss: nan (#416)
发件人:Emma
收件人:"jwyang/faster-rcnn.pytorch"
抄送:JanVin ,Author
I just use the res101 can got the same problem again.
―
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com/jwyang/faster-rcnn.pytorch/issues/416?email_source=notifications&email_token=AK5PKWL32PW2JWT7XVACU3LPVLLAHA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVL3JNQ#issuecomment-492287158, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AK5PKWIOX2XTI64E4CAWDCLPVLLAHANCNFSM4GOJWE3A.
Try master branch
-------- 原始邮件 --------
主题:Re: [jwyang/faster-rcnn.pytorch] Issue :loss: nan (#416)
发件人:Emma
收件人:"jwyang/faster-rcnn.pytorch"
抄送:JanVin ,Author
I just use the res101 can got the same problem again.
―
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com/jwyang/faster-rcnn.pytorch/issues/416?email_source=notifications&email_token=AK5PKWL32PW2JWT7XVACU3LPVLLAHA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVL3JNQ#issuecomment-492287158, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AK5PKWIOX2XTI64E4CAWDCLPVLLAHANCNFSM4GOJWE3A.
I used the pytorch1.0 branch. I think maybe the difference is only the cuda build but not the code inside? What I want to know is that what caused this 'nan' loss? Dataset or something else?
Most likely it's the dataset that caused the 'nan' loss(I have encountered this kind of problem myself too). You have to make sure that none of the coordinates of bounding boxes is (0,0). That is, in the xml files, none of the pairs (xmin, ymin) should be (0,0). If there is any pair of (xmin, ymin) that is (0,0) in the xml files, you have to change it to (1,1) at least.
BTW, deleting files under the directory "cache" is necessary before training after you make sure there's no (xmin, ymin) = (0,0) in the xml files. Hope this method helps you.
I just use the res101 can got the same problem again.
I meet the same error too. have you solve it ? thanks.
我记得好像是数据问题,就是检查一下用的数据集中有没有边界爆0的情况,在不同的backbone下面好像有写死的东西,要检查一下每一个的输出。发自我的iPhone------------------ Original ------------------From: zhengxinvip notifications@github.comDate: Fri,Jul 26,2019 9:28 PMTo: jwyang/faster-rcnn.pytorch faster-rcnn.pytorch@noreply.github.comCc: Emma shiruohua@pku.edu.cn, Comment comment@noreply.github.comSubject: Re: [jwyang/faster-rcnn.pytorch] Issue :loss: nan (#416)
I just use the res101 can got the same problem again.
I meet the same error too. have you solve it ? thanks.
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "https://github.com/jwyang/faster-rcnn.pytorch/issues/416?email_source=notifications\u0026email_token=AGA24NPHKAMAO7AKLM5PS7LQBL3ZVA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24SYFI#issuecomment-515451925",
"url": "https://github.com/jwyang/faster-rcnn.pytorch/issues/416?email_source=notifications\u0026email_token=AGA24NPHKAMAO7AKLM5PS7LQBL3ZVA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24SYFI#issuecomment-515451925",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
我记得好像是数据问题,就是检查一下用的数据集中有没有边界爆0的情况,在不同的backbone下面好像有写死的东西,要检查一下每一个的输出。发自我的iPhone------------------ Original ------------------From: zhengxinvip notifications@github.comDate: Fri,Jul 26,2019 9:28 PMTo: jwyang/faster-rcnn.pytorch faster-rcnn.pytorch@noreply.github.comCc: Emma shiruohua@pku.edu.cn, Comment comment@noreply.github.comSubject: Re: [jwyang/faster-rcnn.pytorch] Issue :loss: nan (#416)
I just use the res101 can got the same problem again.I meet the same error too. have you solve it ? thanks.
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#416?email_source=notifications\u0026email_token=AGA24NPHKAMAO7AKLM5PS7LQBL3ZVA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24SYFI#issuecomment-515451925",
"url": "#416?email_source=notifications\u0026email_token=AGA24NPHKAMAO7AKLM5PS7LQBL3ZVA5CNFSM4GOJWE3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24SYFI#issuecomment-515451925",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
thanks
Have you tried deleting the cache directory in the data directory?
If you modify the pascal_voc.py file, you will need to delete the cache.
mayde you should make sure that your coordinate is positive:
if x1 < 0:
x1 = x1+1