Mmdetection: Segmentation fault

Created on 13 Oct 2018  ·  18Comments  ·  Source: open-mmlab/mmdetection

when run the script “python tools/train.py configs/faster_rcnn_r50_fpn_1x.py --gpus 1 --work_dir logs --validate ”,meet the problem
2018-10-13 23:54:59,086 - INFO - workflow: [('train', 1)], max: 12 epochs
Segmentation fault

Most helpful comment

Updates: Installing PyTorch with conda && compiling cuda extensions with gcc 4.8 will cause the segmentation fault. Either compiling PyTorch from source or using gcc 5 to build ops solves the problem.

@yanxp You may check this and see if it works.

All 18 comments

Such problems are hard to identify remotely... It may be related to the environment or hardwares and we cannot reproduce it. Maybe you need to debug on your own by printing more information.

(Just for reference, we encountered a "segmentation fault" problem in a special environment before, and solved it by compiling PyTorch from source.)

Updates: Installing PyTorch with conda && compiling cuda extensions with gcc 4.8 will cause the segmentation fault. Either compiling PyTorch from source or using gcc 5 to build ops solves the problem.

@yanxp You may check this and see if it works.

@hellock Thanks. It works now.

Updates: Installing PyTorch with conda && compiling cuda extensions with gcc 4.8 will cause the segmentation fault. Either compiling PyTorch from source or using gcc 5 to build ops solves the problem.

@yanxp You may check this and see if it works.

Hi! @hellock
I install gcc 5.5.0 from source in my own folder (ubuntu 14.04), and I reinstall mmcv and mmdet. While running "python tools/test.py", but also got "[ ] 0/5000, elapsed: 0s, ETA:bash: line 1: 6150 Segmentation fault (core dumped)". So how to solve this problem? Must I reinstall pytorch for source?

when i try

./tools/dist_train.sh configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py 8

No,error,no running,just nothing happens................whats the problem.......
@hellock

@AresGao You may need to provide some environment info (system, gcc, cuda, cudnn, pytorch installation).

@hellock 3Q for your reply
ubuntu 16.04
gcc 5.4
cuda 9.0
cudnn 7.0.3
pytorch 0.4.1 conda from official channel

Does non-distributed training or single-gpu training work well?

@hellock
if i try

python tools/train.py  configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py

it will got

Segmentation fault (core dumped)

You can check whether the segmentation fault is caused by custom operators (RoIAlign, etc.). Usually such problems are related to the compiling environment and you may try compiling pytorch from source. Another attempt can be switching to pytorch 1.0.

@hellock 3Q,i will try immediately....

Removing all my conda env, and i solve the problem by using the scripts provided by author.........strange.....

thanks this issue!

Updates: Installing PyTorch with conda && compiling cuda extensions with gcc 4.8 will cause the segmentation fault. Either compiling PyTorch from source or using gcc 5 to build ops solves the problem.
@yanxp You may check this and see if it works.

Hi! @hellock
I install gcc 5.5.0 from source in my own folder (ubuntu 14.04), and I reinstall mmcv and mmdet. While running "python tools/test.py", but also got "[ ] 0/5000, elapsed: 0s, ETA:bash: line 1: 6150 Segmentation fault (core dumped)". So how to solve this problem? Must I reinstall pytorch for source?

Did you run "python setup.py install" again ?

I have the same segfault issue

You can check whether the segmentation fault is caused by custom operators (RoIAlign, etc.). Usually such problems are related to the compiling environment and you may try compiling pytorch from source. Another attempt can be switching to pytorch 1.0.

I installed pytorch1.0, gcc4.8.5 also have error with segmentation fault. My environment is in conda env, when I use "conda install -c serge-sans-paille gcc_49" to change version, it also not work. And have "error: command 'gcc' failed with exit status 1" when I run 'python setup.py develop'. Please tell me how to do next.

Updates: Installing PyTorch with conda && compiling cuda extensions with gcc 4.8 will cause the segmentation fault. Either compiling PyTorch from source or using gcc 5 to build ops solves the problem.

@yanxp You may check this and see if it works.

Thanks for your advice , I have solve the segmentation falut after update gcc-4.8.5 to gcc-5.3.0

Hello, I meet the same error. But I want to ask, when I upgrade gcc to 5.3.0, do I need to rebuild the mmdet again. Thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

FrankXinqi picture FrankXinqi  ·  3Comments

hust-kevin picture hust-kevin  ·  3Comments

tianxinhang picture tianxinhang  ·  3Comments

letanloc1998 picture letanloc1998  ·  3Comments

namheegordonkim picture namheegordonkim  ·  3Comments