Incubator-mxnet: C++ Convolution layer usage produces "Segmentation fault"

Created on 16 Feb 2018  路  14Comments  路  Source: apache/incubator-mxnet

Description

Compiled version 1.1.0 C++ package with make -j USE_OPENCV=0 USE_CPP_PACKAGE=1 on Mac and trying to run LeNet example. Fails with Segmentation fault exeception

Package used (Python/R/Scala/Julia):

C++ package

Build info (Required if built from source)

Compiler

g++
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix

MXNet commit hash:

Version 1.1.0 - 07a83a0325a3d782513a04f47d711710972cb144

Build config:

Taken by default from sources of that hash

Error Message:

No stack trace returned. Here is what I see on terminal:

[1]    77873 segmentation fault  ./main 

Minimum reproducible example

#include "mxnet-cpp/MxNetCpp.h"
using namespace mxnet::cpp;

int main(int argc, char const *argv[]) {
  Symbol data = Symbol::Variable("data");
  Symbol conv1_w("conv1_w"), conv1_b("conv1_b");
  Symbol conv1 = Convolution("conv1", data, conv1_w, conv1_b, Shape(5, 5), 20);
  return 0;

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Compile with all libraries
  2. Run sample code

What have you tried to solve it?

  1. I looked at dmlc who were fixing similar problem, but adding MXNotifyShutdown(); didn't help
Bug C++ Example Pending Requester Info

All 14 comments

Hi, could you tell me where to find the tutorials about building C++ package?

@Mabinogiysk https://mxnet.incubator.apache.org/install/build_from_source#build-the-c-package

I would like to add to this issue, that actually for me, strictly none of the cpp examples ran without segfaulting or crashing on CPU. I got two of them working on gpu, mlp_gpu and lenet_with_mxdataiter

Yes, the cpp package in general has some problems. We have an ongoing design proposal for C++ API improvements:

MXNet C++ package improvements
JIRA or GoogleDocs

Feel free to add your pain points or suggestions.

I wasnt able to replicate the issue with the minimum reproducible example stated above.

./alexnet I am hitting this error:

Process 92985 launched: './alexnet' (x86_64)
libc++abi.dylib: terminating with uncaught exception of type dmlc::Error: [13:28:39] ../include/mxnet-cpp/ndarray.hpp:54: Check failed: MXNDArrayCreate(shape.data(), shape.size(), context.GetDeviceType(), context.GetDeviceId(), delay_alloc, &handle) == 0 (-1 vs. 0) 

./lenet it fails at

Process 93047 launched: './lenet' (x86_64)
[13:31:35] lenet.cpp:79: data
[13:31:35] lenet.cpp:79: conv1_w
[13:31:35] lenet.cpp:79: conv1_b
[13:31:35] lenet.cpp:79: conv2_w
[13:31:35] lenet.cpp:79: conv2_b
[13:31:35] lenet.cpp:79: conv3_w
[13:31:35] lenet.cpp:79: conv3_b
[13:31:35] lenet.cpp:79: fc1_w
[13:31:35] lenet.cpp:79: fc1_b
[13:31:35] lenet.cpp:79: fc2_w
[13:31:35] lenet.cpp:79: fc2_b
[13:31:35] lenet.cpp:79: data_label
[13:31:35] lenet.cpp:113: here read fin
libc++abi.dylib: terminating with uncaught exception of type dmlc::Error: [13:31:35] ../include/mxnet-cpp/ndarray.hpp:221: Check failed: MXNDArraySlice(GetHandle(), begin, end, &handle) == 0 (-1 vs. 0) 

Stack trace returned 8 entries:
[bt] (0) 0   lenet                               0x00000001000114c6 dmlc::StackTrace() + 1238
[bt] (1) 1   lenet                               0x0000000100010eb5 dmlc::LogMessageFatal::~LogMessageFatal() + 53
[bt] (2) 2   lenet                               0x0000000100001155 dmlc::LogMessageFatal::~LogMessageFatal() + 21
[bt] (3) 3   lenet                               0x000000010001a6d9 mxnet::cpp::NDArray::Slice(unsigned int, unsigned int) const + 329
[bt] (4) 4   lenet                               0x0000000100003b57 Lenet::Run() + 10519
[bt] (5) 5   lenet                               0x00000001000011c8 main + 56
[bt] (6) 6   libdyld.dylib                       0x00007fffa8a01235 start + 1
[bt] (7) 7   ???                                 0x0000000000000001 0x0 + 1


Process 93047 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fffa8b2fd42 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fffa8b2fd42 <+10>: jae    0x7fffa8b2fd4c            ; <+20>
    0x7fffa8b2fd44 <+12>: movq   %rax, %rdi
    0x7fffa8b2fd47 <+15>: jmp    0x7fffa8b28caf            ; cerror_nocancel
    0x7fffa8b2fd4c <+20>: retq   
Target 0: (lenet) stopped.

@marcoabreu spidyDev's PR has been merged. If this PR addresses all segfaults, can we close this off? Thanks!

I'm having the same issues as above with version 1.3.

As far as we can see, it's still broken on CPU. We should add tests for this before we can close the issue.

Sorry it's not well documented but you have to make sure that data is downloaded to the correct folder. Try changing to the example folder and running the get_data.sh script.

@Ishitori @KellenSunderland

I have recently submitted changes to ReadMe file in cpp-package and ReadMe file in cpp-package/example that explains how to build cpp-package and run examples.

@Ishitori , since you were trying to run one of the training examples, will you be able to share the use case that you were planning to address using C++ APIs?

@mxnet-label-bot add [Pending Requester Info]

@leleamol, I was just reproducing an error from the forum. I am not sure I can find original message anymore though...

@Ishitori Are you still facing this issue? Were you able to follow the documentation above, to have it working consistently? Requesting an update

@lanking520 requesting to close due to lack of activity

@Ishitori Please feel free to reopen it if you are still facing the problem. Close it for now.

Was this page helpful?
0 / 5 - 0 ratings