The reproducible repository is linked here.
Currently, airbnb is using the quantization extensions of mxnet to boost inference time on several convolutional neural network models. However, it has been difficult to achieve. The most complicated bugs lie in the intersection between the python and C++ interface, like the ones that crash jupyter kernels and are hard to run pdb on.
Airbnb currently extensively uses gluon models and are not planning to move to Module models any time soon for training, but it seems that creating a quantized Module model solely for inference is useful. Please refer to the repository for a minimum reproducible example.
qsym, qarg_params, qaux_params = mx.contrib.quantization.quantize_model(
sym=model._symbol,
arg_params=model._arg_params,
aux_params=model._aux_params,
excluded_sym_names=['hybridsequential0_conv0_fwd',
'hybridsequential0_dense0_fwd',
'hybridsequential0_dense1_fwd',
],
ctx=ctx,
calib_data=data_iter,
)
qmodel = mx.mod.Module(symbol = qsym, context = mx.cpu())
qmodel.bind(for_training=False,
data_shapes = data_iter.provide_data,
label_shapes = data_iter.provide_label)
qmodel.set_params(qarg_params, qaux_params)
qmodel.predict(data_iter)
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
~/anaconda3/envs/idp3/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
~/anaconda3/envs/idp3/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
398 if cls is not object \
399 and callable(cls.__dict__.get('__repr__')):
--> 400 return _repr_pprint(obj, self, cycle)
401
402 return _default_pprint(obj, self, cycle)
~/anaconda3/envs/idp3/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
693 """A pprint that just redirects to the normal repr function."""
694 # Find newlines and replace them with p.break_()
--> 695 output = repr(obj)
696 for idx,output_line in enumerate(output.splitlines()):
697 if idx:
~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in __repr__(self)
187 """Returns a string representation of the array."""
188 shape_info = 'x'.join(['%d' % x for x in self.shape])
--> 189 return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
190 self.__class__.__name__,
191 shape_info, self.context)
~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asnumpy(self)
1908 self.handle,
1909 data.ctypes.data_as(ctypes.c_void_p),
-> 1910 ctypes.c_size_t(data.size)))
1911 return data
1912
~/anaconda3/envs/idp3/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
208 """
209 if ret != 0:
--> 210 raise MXNetError(py_str(_LIB.MXGetLastError()))
211
212
MXNetError: [19:42:27] src/operator/quantization/mkldnn/mkldnn_quantized_conv.cc:41: Check failed: in_data[0].dtype() == mshadow::kUint8 (5 vs. 3) mkldnn_quantized_conv op only supports uint8 as input type
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x0000000109ffd6d4 libmxnet.so + 26324
[bt] (1) 1 libmxnet.so 0x0000000109ffd48f libmxnet.so + 25743
[bt] (2) 2 libmxnet.so 0x000000010a032093 libmxnet.so + 241811
[bt] (3) 3 libmxnet.so 0x000000010b14f745 MXNDListFree + 289605
[bt] (4) 4 libmxnet.so 0x000000010b11b3fc MXNDListFree + 75772
[bt] (5) 5 libmxnet.so 0x000000010b11e931 MXNDListFree + 89393
[bt] (6) 6 libmxnet.so 0x000000010b11e847 MXNDListFree + 89159
[bt] (7) 7 libmxnet.so 0x000000010b11c175 MXNDListFree + 79221
[bt] (8) 8 libsystem_pthread.dylib 0x00000001018eb661 _pthread_body + 340
[bt] (9) 9 libsystem_pthread.dylib 0x00000001018eb50d _pthread_body + 0
quantization-bug git/master
(idp3) โฏ pip freeze | grep mxnet
keras-mxnet==2.2.0
mxnet-mkl==1.3.0b20180711
quantization-bug git/master
(idp3) โฏ uname -a
Darwin ray-zhangs-mbp.airgc.net 17.6.0 Darwin Kernel Version 17.6.0: Tue May 8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 x86_64
@pengzhao-intel
Just an update: When I install mxnet==1.3.0b20180715 OR mxnet-cu90==1.3.0b20180715, it throws the following error upon qmodel.bind(...) in the notebook, as in it segfaults before even the predict().
[09:39:42] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_pooling
[09:39:42] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[09:39:42] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_pooling
Segmentation fault: 11
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x000000010a0fbeb4 libmxnet.so + 20148
[bt] (1) 1 libmxnet.so 0x000000010b751b56 MXTVMBridge + 3956742
[bt] (2) 2 libsystem_platform.dylib 0x00000001018d9f5a _sigtramp + 26
[bt] (3) 3 ??? 0x0000000000000006 0x0 + 6
[bt] (4) 4 libmxnet.so 0x000000010b2055dc MXNDListFree + 147580
[bt] (5) 5 libmxnet.so 0x000000010b2114b3 MXNDListFree + 196435
[bt] (6) 6 libmxnet.so 0x000000010b21cc04 MXNDListFree + 243364
[bt] (7) 7 libmxnet.so 0x000000010b221dba MXNDListFree + 264282
[bt] (8) 8 libmxnet.so 0x000000010b1ac0b0 MXExecutorSimpleBind + 8656
[bt] (9) 9 _ctypes.cpython-36m-darwin.so 0x0000000107e76d17 ffi_call_unix64 + 79
I compiled mxnet with debug symbols and built from source with DEBUG=1 on linux for CPU, and ran the same thing that caused the segmentation fault above with gdb and did a backtrace. This is the longer bt of the segfault:
[19:15:44] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_pooling
[19:15:44] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[19:15:44] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_pooling
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffa0f6aabf in mxnet::exec::AttachOpResources(nnvm::Graph const&, std::vector<std::shared_ptr<mxnet::exec::OpExecutor>, std::allocator<std::shared_ptr<mxnet::exec::OpExecutor> > > const&, unsigned long, unsigned long) ()
from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
(gdb) bt
#0 0x00007fffa0f6aabf in mxnet::exec::AttachOpResources(nnvm::Graph const&, std::vector<std::shared_ptr<mxnet::exec::OpExecutor>, std::allocator<std::shared_ptr<mxnet::exec::OpExecutor> > > const&, unsigned long, unsigned long) ()
from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
#1 0x00007fffa0f6cacf in mxnet::exec::AttachOpResources(nnvm::Graph const&) ()
from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
#2 0x00007fffa0f33d4a in mxnet::exec::GraphExecutor::FinishInitGraph(nnvm::Symbol, nnvm::Graph, mxnet::Executor*, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&) () from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
#3 0x00007fffa0f38c9b in mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::Context, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nnvm::TShape, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nnvm::TShape> > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::NDArray> > >*, mxnet::Executor*, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&) () from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
#4 0x00007fffa0f391fc in mxnet::Executor::SimpleBind(nnvm::Symbol, mxnet::Context const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::Context, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nnvm::TShape, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nnvm::TShape> > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::NDArray> > >*, mxnet::Executor*) () from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
#5 0x00007fffa15a4b85 in MXExecutorSimpleBind () from /efs/home/ray_zhang/temp/incubator-mxnet/python/mxnet/../../lib/libmxnet.so
#6 0x00007ffff648bdf8 in ffi_call_unix64 ()
at /workdir/conda-build/python_1525407712731/work/Python-3.6.3/Modules/_ctypes/libffi/src/x86/unix64.S:76
#7 0x00007ffff648ae98 in ffi_call (cif=cif@entry=0x7fffffffca10, fn=fn@entry=0x7fffa15a1f70 <MXExecutorSimpleBind>,
rvalue=rvalue@entry=0x7fffffffc530, avalue=avalue@entry=0x7fffffffc410)
at /workdir/conda-build/python_1525407712731/work/Python-3.6.3/Modules/_ctypes/libffi/src/x86/ffi64.c:525
#8 0x00007ffff64832b3 in _call_function_pointer (argcount=34, resmem=0x7fffffffc530, restype=<optimized out>, atypes=<optimized out>,
avalues=0x7fffffffc410, pProc=0x7fffa15a1f70 <MXExecutorSimpleBind>, flags=4353)
at /workdir/conda-build/python_1525407712731/work/Python-3.6.3/Modules/_ctypes/callproc.c:809
#9 _ctypes_callproc (pProc=pProc@entry=0x7fffa15a1f70 <MXExecutorSimpleBind>, argtuple=argtuple@entry=0x7fffc29a12c8, flags=4353,
argtypes=argtypes@entry=0x0, restype=0x55555583fad8, checker=0x0)
at /workdir/conda-build/python_1525407712731/work/Python-3.6.3/Modules/_ctypes/callproc.c:1147
#10 0x00007ffff647a8af in PyCFuncPtr_call (self=self@entry=0x7fffc2919c00, inargs=inargs@entry=0x7fffc29a12c8, kwds=kwds@entry=0x0)
at /workdir/conda-build/python_1525407712731/work/Python-3.6.3/Modules/_ctypes/_ctypes.c:3935
#11 0x00007ffff790de8b in _PyObject_FastCallDict (func=0x7fffc2919c00, args=<optimized out>, nargs=<optimized out>, kwargs=0x0)
at Objects/abstract.c:2331
#12 0x00007ffff7a01408 in call_function (pp_stack=pp_stack@entry=0x7fffffffccf0, oparg=oparg@entry=34, kwnames=kwnames@entry=0x0)
at Python/ceval.c:4849
#13 0x00007ffff7a06046 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3323
#14 0x00007ffff7a0129e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7fffe88885d0, globals=globals@entry=0x7fffe88e8c18,
locals=locals@entry=0x0, args=args@entry=0x7fffffffcf80, argcount=argcount@entry=1, kwnames=kwnames@entry=0x7fffc70c5de0,
kwargs=kwargs@entry=0x7fffc70c5de8, kwcount=18, kwcount@entry=9, kwstep=kwstep@entry=2, defs=0x7fffe8881540, defcount=7, kwdefs=0x0,
closure=0x0, name=0x7fffe8887cb0, qualname=0x7fffe8886930) at Python/ceval.c:4154
#15 0x00007ffff7a09ec5 in _PyFunction_FastCallDict (func=func@entry=0x7fffe39c8950, args=args@entry=0x7fff---Type <return> to continue, or q <---Type <return> t---Type <return> to continue, or q <return> to quit---
@entry=0x7fffc2944c18) at Python/ceval.c:5058ntinue
#16 0x00007ffff790df7e in _PyObject_FastCallDict (func=func@entry=0x7fffe39c8950, args=args@entry=0x7fffffffcf80, nargs=nargs@entry=1,
kwargs=kwargs@entry=0x7fffc2944c18) at Objects/abstract.c:2310
#17 0x00007ffff790e06e in _PyObject_Call_Prepend (func=0x7fffe39c8950, obj=0x7fffc2967d08, args=0x7ffff7fa2048, kwargs=0x7fffc2944c18)
at Objects/abstract.c:2373
#18 0x00007ffff790dd2a in PyObject_Call (func=0x7ffff7f98888, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2261
#19 0x00007ffff7a0599f in do_call_core (kwdict=0x7fffc2944c18, callargs=0x7ffff7fa2048, func=0x7ffff7f98888) at Python/ceval.c:5094
#20 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3392
#21 0x00007ffff7a00930 in _PyFunction_FastCall (co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Python/ceval.c:4907
#22 0x00007ffff7a01854 in fast_function (kwnames=0x0, nargs=<optimized out>, stack=<optimized out>, func=0x7fffd70c38c8) at Python/ceval.c:4942
#23 call_function (pp_stack=pp_stack@entry=0x7fffffffd330, oparg=oparg@entry=4, kwnames=kwnames@entry=0x0) at Python/ceval.c:4846
#24 0x00007ffff7a06046 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3323
#25 0x00007ffff7a0129e in _PyEval_EvalCodeWithName (_co=0x7fffd70bb0c0, globals=globals@entry=0x7fffd70a7510, locals=locals@entry=0x0, args=<optimized out>,
argcount=4, kwnames=0x0, kwargs=0x55555ae6b5b0, kwcount=0, kwstep=kwstep@entry=1, defs=0x7fffd70be520, defcount=defcount@entry=2, kwdefs=kwdefs@entry=0x0,
closure=0x0, name=name@entry=0x7fffd70b54f0, qualname=0x7fffd9884d40) at Python/ceval.c:4154
#26 0x00007ffff7a015b2 in fast_function (kwnames=0x0, nargs=<optimized out>, stack=<optimized out>, func=0x7fffd70c3268) at Python/ceval.c:4966
#27 call_function (pp_stack=pp_stack@entry=0x7fffffffd5e0, oparg=oparg@entry=3, kwnames=kwnames@entry=0x0) at Python/ceval.c:4846
#28 0x00007ffff7a06046 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3323
#29 0x00007ffff7a0129e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7fffd70b84b0, globals=globals@entry=0x7fffd70a7510, locals=locals@entry=0x0,
args=args@entry=0x7fffc2943760, argcount=argcount@entry=10, kwnames=kwnames@entry=0x7fffc29bca60, kwargs=kwargs@entry=0x7fffc29bca68, kwcount=10,
kwcount@entry=5, kwstep=kwstep@entry=2, defs=0x7fffd70c1120, defcount=6, kwdefs=0x0, closure=0x0, name=0x7ffff7fa5270, qualname=0x7fffd9884c38)
at Python/ceval.c:4154
#30 0x00007ffff7a09ec5 in _PyFunction_FastCallDict (func=func@entry=0x7fffd70c30d0, args=args@entry=0x7fffc2943760, nargs=10,
kwargs=kwargs@entry=0x7fffc28b6f78) at Python/ceval.c:5058
#31 0x00007ffff790df7e in _PyObject_FastCallDict (func=func@entry=0x7fffd70c30d0, args=args@entry=0x7fffc2943760, nargs=nargs@entry=10,
kwargs=kwargs@entry=0x7fffc28b6f78) at Objects/abstract.c:2310
#32 0x00007ffff790e014 in _PyObject_Call_Prepend (func=0x7fffd70c30d0, obj=0x7fffc292ccc0, args=0x7fffc29f94f8, kwargs=0x7fffc28b6f78)
at Objects/abstract.c:2373
#33 0x00007ffff790dd2a in PyObject_Call (func=0x7ffff7f988c8, args=<optimized out>, kwargs=<optimized out>) at Objects/abstract.c:2261
#34 0x00007ffff7986e39 in slot_tp_init (self=0x7fffc292ccc0, args=0x7fffc29f94f8, kwds=0x7fffc28b6f78) at Objects/typeobject.c:6407
#35 0x00007ffff797f6b3 in type_call (type=<optimized out>, type@entry=0x5555564ae018, args=args@entry=0x7fffc29f94f8, kwds=kwds@entry=0x7fffc28b6f78)
at Objects/typeobject.c:915
#36 0x00007ffff790de8b in _PyObject_FastCallDict (func=func@entry=0x5555564ae018, args=args@entry=0x55555ae79f70, nargs=nargs@entry=9,
kwargs=kwargs@entry=0x7fffc28b6f78) at Objects/abstract.c:2331
#37 0x00007ffff790e2ee in _PyObject_FastCallKeywords (func=0x5555564ae018, stack=0x55555ae79f70, nargs=9, kwnames=<optimized out>) at Objects/abstract.c:2496
#38 0x00007ffff7a01408 in call_function (pp_stack=pp_stack@entry=0x7fffffffdb80, oparg=oparg@entry=14, kwnames=kwnames@entry=0x7fffd9884518)
at Python/ceval.c:4849
#39 0x00007ffff7a05a98 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3339
#40 0x00007ffff7a0129e in _PyEval_EvalCodeWithName (_co=0x7fffd9886b70, globals=globals@entry=0x7fffe10d37e0, locals=locals@entry=0x0, args=<optimized out>,
argcount=1, kwnames=0x7ffff672e9f0, kwargs=0x5555557e8e08, kwcount=3, kwstep=kwstep@entry=1, defs=0x7fffd98819c0, defcount=defcount@entry=6,
kwdefs=kwdefs@entry=0x0, closure=0x0, name=name@entry=0x7ffff66e9228, qualname=0x7fffd70bcbb0) at Python/ceval.c:4154
#41 0x00007ffff7a015b2 in fast_function (kwnames=0x6, nargs=<optimized out>, stack=<optimized out>, func=0x7fffd70c2158) at Python/ceval.c:4966
#42 call_function (pp_stack=pp_stack@entry=0x7fffffffde30, oparg=oparg@entry=3, kwnames=kwnames@entry=0x7ffff672e9d8) at Python/ceval.c:4846
#43 0x00007ffff7a05a98 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3339
#44 0x00007ffff7a0129e in _PyEval_EvalCodeWithName (_co=_co@entry=0x7ffff69884b0, globals=globals@entry=0x7ffff7f583f0, locals=locals@entry=0x7ffff7f583f0,
args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=kwargs@entry=0x0, kwcount=kwcount@entry=0, kwstep=kwstep@entry=2,
defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0, name=name@entry=0x0, qualname=qualname@entry=0x0)
at Python/ceval.c:4154
#45 0x00007ffff7a018cd in PyEval_EvalCodeEx (_co=_co@entry=0x7ffff69884b0, globals=globals@entry=0x7ffff7f583f0, locals=locals@entry=0x7ffff7f583f0,
args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0,
kwdefs=kwdefs@entry=0x0, closure=closure@entry=0x0) at Python/ceval.c:4175
#46 0x00007ffff7a0191b in PyEval_EvalCode (co=co@entry=0x7ffff69884b0, globals=globals@entry=0x7ffff7f583f0, locals=locals@entry=0x7ffff7f583f0)
at Python/ceval.c:730
#47 0x00007ffff7a3c472 in run_mod (arena=0x7ffff7f73078, flags=0x7fffffffe130, locals=0x7ffff7f583f0, globals=0x7ffff7f583f0, filename=0x7ffff7e97650,
mod=0x55555590de50) at Python/pythonrun.c:980
#48 PyRun_FileExFlags (fp=fp@entry=0x5555557e8c70, filename_str=filename_str@entry=0x7ffff7e62960 "qbug.py", start=start@entry=257,
globals=globals@entry=0x7ffff7f583f0, locals=locals@entry=0x7ffff7f583f0, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe130)
at Python/pythonrun.c:933
#49 0x00007ffff7a3c5d7 in PyRun_SimpleFileExFlags (fp=fp@entry=0x5555557e8c70, filename=<optimized out>, closeit=closeit@entry=1,
flags=flags@entry=0x7fffffffe130) at Python/pythonrun.c:396
#50 0x00007ffff7a3ca33 in PyRun_AnyFileExFlags (fp=fp@entry=0x5555557e8c70, filename=<optimized out>, closeit=closeit@entry=1,
flags=flags@entry=0x7fffffffe130) at Python/pythonrun.c:80
#51 0x00007ffff7a576dc in run_file (p_cf=0x7fffffffe130, filename=0x555555757100 L"qbug.py", fp=0x5555557e8c70) at Modules/main.c:338
#52 Py_Main (argc=<optimized out>, argv=0x555555757010) at Modules/main.c:814
#53 0x0000555555554cde in main (argc=2, argv=<optimized out>) at ./Programs/python.c:69
@reminisce Hi, is there a slack channel I can join for discussion about this bug? I can't seem to get quantization working at all. (Even the examples segfault for me)
@OneRaynyDay From the error message, it seems that you quantized the model using a version of MXNet, but ran inference with the quantized model using an older version than that. Were you able to run the example here:
https://github.com/apache/incubator-mxnet/tree/master/example/quantization
It seems that an invitation for you to join the slack channel has been sent out. Let us know if you don't have it. Thanks.
Hi @reminisce , thanks for the response. I have not received a slack invitation to [email protected]. Was it to another e-mail address?
With regards to the example, I could not run it. It segfaults in the same way. I also suspect that it may be a versioning issue, but I don't believe it to be the reason because in jupyter, I quantized the model and ran inference in the same notebook(so therefore in the same mxnet version) and it segfaults.
Update: I was able to run the examples correctly on a specific configuration:
mxnet-cu90==1.2.0
ubuntu 16.04
single gpu(0) as ctx
I ran single cpu(0) as ctx in OS X and the segfault still occurs.
CPU quantization was implemented by Intel engineers. I've already informed them offline and they will take this forward.
You are right, the invitation to the slack channel has not been sent out yet. You should be able to receive that soon.
Thank you. I have also found out the reason for the failure, and it is not related to CPU quantization. We should discuss more about this in the slack channel. It is a very strange issue.
@OneRaynyDay we have successfully run MNIST in our local with the quantization flow.
@xinyu-intel will help to take a look for your scripts.
@xinyu-intel: do you need help addressing this issue ?
@access2rohit hi, i'm working in progress on this.
hi @OneRaynyDay have you resolved your problem? I have the same error:
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_conv
[17:16:31] src/executor/attach_op_execs_pass.cc:336: Neither FCompute nor FComputeEx registered _contrib_quantized_pooling
Segmentation fault: 11
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x00000001013236b4 libmxnet.so + 18100
[bt] (1) 1 libmxnet.so 0x00000001029b4136 MXTVMBridge + 4014070
[bt] (2) 2 libsystem_platform.dylib 0x00007fff5822ff5a _sigtramp + 26
[bt] (3) 3 ??? 0x00000000a242980c 0x0 + 2722273292
[bt] (4) 4 libmxnet.so 0x0000000102459bcc MXNDListFree + 147580
[bt] (5) 5 libmxnet.so 0x0000000102465aa3 MXNDListFree + 196435
[bt] (6) 6 libmxnet.so 0x00000001024711f4 MXNDListFree + 243364
[bt] (7) 7 libmxnet.so 0x00000001024763aa MXNDListFree + 264282
[bt] (8) 8 libmxnet.so 0x00000001024006a0 MXExecutorSimpleBind + 8656
[bt] (9) 9 libffi.6.dylib 0x00000001007bc884 ffi_call_unix64 + 76
I ran in OSX, and ctx=cpu.
@bettyeats Hi, it is because you're using ctx=cpu(). Please refer to the feature request link for more information. Currently there is no workaround. You need ctx=gpu()
For CPU, you can build with MKLDNN. Native CPU quantization is not implemented.
@xinyu-intel @PatricZhao is there work to support int8 type in MKLDNN ? Currently it only supports uint8
MXNetError: [21:24:54] src/operator/quantization/mkldnn/mkldnn_quantized_conv.cc:41: Check failed: in_data[0].dtype() == mshadow::kUint8 (5 vs. 3) : mkldnn_quantized_conv op only supports uint8 as input type
@ThomasDelteil the fused version of convolution can work with INT8 in here
Please try to run models with https://github.com/apache/incubator-mxnet/blob/master/example/quantization/imagenet_gen_qsym_mkldnn.py
@ThomasDelteil add this line sym = sym.get_backend_symbol('MKLDNN_QUANTIZE') to enable subgraph fusion before using quantize_net api.
Closing this issue for now. Please feel free to reopen if you come across this again
Most helpful comment
For CPU, you can build with MKLDNN. Native CPU quantization is not implemented.