It breaks by returning message 'Segmentation fault (core dumped)'.
I think I followed the installation guide correctly. But when I run the code (maskrcnn-benchmark/demo/webcam.py), it breaks by returning 'Segmentation fault (core dumped)'.
The specific location returning the message is
line 27 in the file 'boxlist_ops.py': keep = _box_nms(boxes, score, nms_thresh)
from
line 114 in the file 'inference.py': boxlist = boxlist_nms( .... )
from
line 138 in the file 'inference.py': sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
from
line 122 in the file 'rpn.py': boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
from
line 96 in the file 'rpn.py': return self._forward_test(anchors, objectness, rpn_box_regression)
from
line 50 in the file 'generalised_rcnn.py': proposals, proposal_losses = self.rpn(images, features,
targets)
from
line 205 in the file 'predictor.py': predictions = self.model(image_list)
from
file 'webcam.py': composite = coco_demo.run_on_opencv_image(img)
I thought it was because of my small GPU memory problem on my local machine (PC). But when I tried to run the same code on an available deep learning machine which has enough (about 11GB GPU) memory. It still returns the same message and breaks.
What is my problem? It would be grateful if someone has an answer. Thanks in advance.
Best,
Young
updating gcc to 4.9 and rebuild can fix it
https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
Hi,
Could you run the python script via gdb and show the traceback?
gdb python
>> run demo.py
.....
>> bt
Also, Does this segfault when running on the CPU?
Thanks!
@youngkyoonjang Hello,I also encountered the problem, after the output of "Start training" ,the process stopped without any output. And I saw "[1] 8817 segmentation fault (core dumped) python tools/train_net.py --config-file" in the shell.
Hi @ll490187880 ,
Could you try running the aforementioned commands and give the output of the stack trace?
Something like
gdb python
>> run "tools/train_net.py --config-file ..."
and once it crashes, run
>> bt
and paste the result?
@fmassa yeah, as follows:
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffac102424 in __gnu_cxx::new_allocator<_object*>::construct<_object*, _object*> (__p=0xb, this=0x5555565456e8) at /usr/include/c++/4.8.2/ext/new_allocator.h:120
120 { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
(gdb) bt
at /usr/include/c++/4.8.2/bits/stl_vector.h:920
Hi,
Could you run the python script via gdb and show the traceback?
gdb python >> run demo.py ..... >> btAlso, Does this segfault when running on the CPU?
Thanks!
@fmassa I did it before following @senlinuc 's comments (updating gcc to 4.9 / rebuilding it):
gdb python
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/yj18885/anaconda3/envs/MaskRCNN-PyTorch1.0/bin/python3.6...done.
(gdb) run webcam.py
Starting program: /home/yj18885/anaconda3/envs/MaskRCNN-PyTorch1.0/bin/python webcam.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /home/yj18885/.local/lib/python3.6/site-packages/cv2/.libs/libz-a147dcb0.so.1.2.3
Missing separate debuginfo for /home/yj18885/.local/lib/python3.6/site-packages/numpy/core/../.libs/libgfortran-ed201abd.so.3.0.0
[New Thread 0x7fffe580e700 (LWP 19715)]
[New Thread 0x7fffe500d700 (LWP 19716)]
[New Thread 0x7fffe280c700 (LWP 19717)]
[New Thread 0x7fffe000b700 (LWP 19718)]
[New Thread 0x7fffdd80a700 (LWP 19719)]
[New Thread 0x7fffd9009700 (LWP 19720)]
[New Thread 0x7fffd6808700 (LWP 19721)]
[Thread 0x7fffe280c700 (LWP 19717) exited]
[Thread 0x7fffd6808700 (LWP 19721) exited]
[Thread 0x7fffe500d700 (LWP 19716) exited]
[Thread 0x7fffdd80a700 (LWP 19719) exited]
[Thread 0x7fffe000b700 (LWP 19718) exited]
[Thread 0x7fffe580e700 (LWP 19715) exited]
[Thread 0x7fffd9009700 (LWP 19720) exited]
Detaching after fork from child process 19722.
Detaching after fork from child process 19724.
[New Thread 0x7fffd6808700 (LWP 19726)]
[New Thread 0x7fffd9009700 (LWP 19727)]
[New Thread 0x7fffdd80a700 (LWP 19728)]
[New Thread 0x7fffe000b700 (LWP 19729)]
[New Thread 0x7fff953c1700 (LWP 19730)]
[New Thread 0x7fff94bc0700 (LWP 19731)]
[New Thread 0x7fff8eefe700 (LWP 19732)]
[New Thread 0x7fff8e6fd700 (LWP 19733)]
[New Thread 0x7fff8defc700 (LWP 19734)]
[New Thread 0x7fff8d6fb700 (LWP 19735)]
[New Thread 0x7fff8cefa700 (LWP 19736)]
[New Thread 0x7fff8c6f9700 (LWP 19737)]
[New Thread 0x7fff8bef8700 (LWP 19738)]
Program received signal SIGSEGV, Segmentation fault.
0x00007fff95425424 in construct<_object*, _object*> (__p=0xb, this=0x13259e8) at /usr/include/c++/4.8.2/ext/new_allocator.h:120
120 { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64
(gdb) bt
at /usr/include/c++/4.8.2/bits/stl_vector.h:920
at /home/yj18885/anaconda3/envs/MaskRCNN-PyTorch1.0/lib/python3.6/site-packages/torch/lib/include/pybind11/pybind11.h:618
kwargs=0x21281a8, kwcount=2, kwstep=kwstep@entry=1, defs=0x7fff9547ab60, defcount=defcount@entry=2, kwdefs=kwdefs@entry=0x0, closure=0x0, name=name@entry=0x7fff9548f370, qualname=0x7fff9548f370)
at Python/ceval.c:4159
kwnames=kwnames@entry=0x7ffff7f93060, kwargs=kwargs@entry=0x7ffff7f93068, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=0x7fff9549fd48, defcount=1, kwdefs=0x0, closure=0x0, name=0x7fffd0687a78,
qualname=0x7fff955233a0) at Python/ceval.c:4159
kwnames=kwnames@entry=0x0, kwargs=kwargs@entry=0x0, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff7f96170, qualname=0x7fffe2180470)
at Python/ceval.c:4159
kwnames=kwnames@entry=0x7ffff7f93060, kwargs=kwargs@entry=0x7ffff7f93068, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=0x7fff95477808, defcount=1, kwdefs=0x0, closure=0x0, name=0x7fffd0687a78,
qualname=0x7fff9548ba98) at Python/ceval.c:4159
---Type
kwnames=kwnames@entry=0x0, kwargs=kwargs@entry=0x0, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff7f96170, qualname=0x7fffe2180470)
at Python/ceval.c:4159
kwnames=kwnames@entry=0x7ffff7f93060, kwargs=kwargs@entry=0x7ffff7f93068, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=0x7fff95cfcfb0, defcount=1, kwdefs=0x0, closure=0x0, name=0x7fffd0687a78,
qualname=0x7fff9546b780) at Python/ceval.c:4159
kwnames=kwnames@entry=0x0, kwargs=kwargs@entry=0x0, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff7f96170, qualname=0x7fffe2180470)
at Python/ceval.c:4159
kwcount=0, kwstep=kwstep@entry=1, defs=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=0x0, name=name@entry=0x7fffd05d86a8, qualname=0x7fffd0657da0) at Python/ceval.c:4159
kwnames=kwnames@entry=0x0, kwargs=kwargs@entry=0x0, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0,
name=name@entry=0x0, qualname=qualname@entry=0x0) at Python/ceval.c:4159
kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0) at Python/ceval.c:4180
closeit=closeit@entry=1, flags=flags@entry=0x7fffffffdaa0) at Python/pythonrun.c:978
---Type
(gdb)
And Yes! Segmentation fault (core dumped) happens even when I run on the CPU (python webcam.py --min-image-size 300 MODEL.DEVICE cpu).
Thanks for the stack trace!
This is a known issue with gcc < 4.9, see https://github.com/pytorch/pytorch/issues/6987.
While compiling the extensions you probably saw a warning like
Your compiler (g++ 4.8) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
The solution is to upgrade to gcc 4.9 or higher following the instructions in https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
I'm closing the issue, and I'll be adding a note on the TROUBLESHOOTING section mentioning this issue.
Let us know if after updating gcc and recompiling the library (after rm -rf build/ folder) you still have issues.
I've just improved the README with instructions on how to address this issue in https://github.com/facebookresearch/maskrcnn-benchmark/pull/38
Please let me know if it doesn't help in your case.
@
I've just improved the README with instructions on how to address this issue in #38
Please let me know if it doesn't help in your case.
Thanks. @fmassa Now it is running perfect!
Thanks for the stack trace!
This is a known issue with gcc < 4.9, see pytorch/pytorch#6987.
While compiling the extensions you probably saw a warning like
Your compiler (g++ 4.8) may be ABI-incompatible with PyTorch! Please use a compiler that is ABI-compatible with GCC 4.9 and above. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html. See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6 for instructions on how to install GCC 4.9 or higher.The solution is to upgrade to gcc 4.9 or higher following the instructions in https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
I'm closing the issue, and I'll be adding a note on the TROUBLESHOOTING section mentioning this issue.
Let us know if after updating gcc and recompiling the library (after
rm -rf build/folder) you still have issues.
at /tmp/build/80754af9/python_1546130271559/work/Python/thread_pthread.h:300
at /tmp/build/80754af9/python_1546130271559/work/Python/thread_pthread.h:205
Most helpful comment
I've just improved the README with instructions on how to address this issue in https://github.com/facebookresearch/maskrcnn-benchmark/pull/38
Please let me know if it doesn't help in your case.