No pre-training model was used
2020-04-14 10:51:30,850 - mmdet - INFO - Epoch [1][5/1564] lr: 0.00068, eta: 1 day, 12:22:47, time: 3.490, data_time: 0.105, memory: 12023, loss_cls: 403423626349.0533, loss_bbox: 22096496443.6824, loss: 425520111323.9357
File "/home/fei.qi/object_detection/mmdetection/mmdet/models/anchor_heads/ssd_head.py", line 187, in loss
'classification scores become infinite or NaN!'
AssertionError: classification scores become infinite or NaN!
Sometimes training is fine, Sometimes training is error. Is it because there is no pre-training model๏ผ
SSD is easily to diverge at the beginning, you may reduce the warmup_ratio, for example, you can reduce from 1.0 / 3 to 1.0 / 10. We are considering add gradient clip for SSD in v2.0.
You can also refer to some previous issues: #1203.
@yhcao6 I encounter the same question, but I can't solve that by reducing the warmup_ratio, and grad_clip also doesn't work.
The detail of error I encounter is:
ยทยทยทยทยทยท
2020-09-28 10:09:01,485 - mmdet - INFO - Epoch [1][60/12825] lr: 1.189e-07, eta: 4:29:06, time: 0.066, data_time: 0.008, memory: 4996, loss_cls: 17.0982, loss_bbox: 1.5800, loss: 18.6782
2020-09-28 10:09:01,561 - mmdet - INFO - Epoch [1][61/12825] lr: 1.209e-07, eta: 4:26:45, time: 0.074, data_time: 0.019, memory: 4996, loss_cls: 16.9742, loss_bbox: inf, loss: inf
ยทยทยทยทยทยท
File "~/mmdetection/mmdet/models/dense_heads/ssd_head.py", line 258, in loss
'classification scores become infinite or NaN!'
AssertionError: classification scores become infinite or NaN!
Here is my environment:
sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: :/usr/local/cuda
GPU 0: GeForce RTX 2080 Ti
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.4.0
MMCV: 1.1.2
MMDetection: 2.4.0+b4e2155
MMDetection Compiler: GCC 7.3
MMDetection CUDA Compiler: 10.1
This is the first time I ask a question on github. If I miss something, please tell me, thx!
Most helpful comment
SSD is easily to diverge at the beginning, you may reduce the warmup_ratio, for example, you can reduce from 1.0 / 3 to 1.0 / 10. We are considering add gradient clip for SSD in v2.0.
You can also refer to some previous issues: #1203.