Incubator-mxnet: Generate the Amalgamation for android with NNPACK

Created on 29 Dec 2016 · 4Comments · Source: apache/incubator-mxnet

If I generate the amalgamation for android without other librarys, the running time of WhatsThis App is about 700ms.

However, when I generate Amalgamation with libnnpack.a or others librarys generate by ndk-build, the running time of WhatsThis became more than 6000ms, some times almost 7s.

here is what I have done:

add these to the makefile in amalgamation:
export NNPACK_ROOT=${MXNET_ROOT}/../nnpack/NNPACK

mxnet itself

CFLAGS += -I${MXNET_ROOT} #mxnetroot
CFLAGS += -I${MXNET_ROOT}/dmlc-core/include #mxnetroot/dmlc-core/include
CFLAGS += -I${MXNET_ROOT}/include #mxnetroot/include
CFLAGS += -I${MXNET_ROOT}/mshadow #mxnetroot/mshadow

nnpack:

CFLAGS += -DMXNET_USE_NNPACK=1
CFLAGS += -DMXNET_USE_NNPACK_NUM_THREADS=8
CFLAGS += -I${NNPACK_ROOT}/include
LDFLAGS += -L${NNPACK_ROOT}/obj/local/armeabi-v7a

LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lnnpack_reference -lgtest -lfp16_utils -lbench_utils -lcpufeatures

LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lcpufeatures

nnpack dependence googletest:

CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0/include
CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0

nnpack dependence pthreadpool:

CFLAGS += -I${NNPACK_ROOT}/third-party/pthreadpool/include
CFLAGS += -I${NNPACK_ROOT}/third-party/FXdiv/include

other define used

CFLAGS += -DMSHADOW_STAND_ALONE=1
CFLAGS += -DMSHADOW_USE_CUDA=0
CFLAGS += -DMSHADOW_USE_MKL=0
CFLAGS += -DSHADOW_RABIT_PS=0
CFLAGS += -DMSHADOW_DIST_PS=0
CFLAGS += -DMSHADOW_USE_SSE=0
CFLAGS += -DMXNET_USE_OPENCV=0
CFLAGS += -DMXNET_PREDICT_ONLY=0
CFLAGS += -DDISABLE_OPENMP=1

add these to Amalgamation.py the line 123

if MXNET_USE_NNPACK == 1

include "src/operator/nnpack/nnpack_convolution-inl.h"

endif // MXNET_USE_NNPACK

run
make ANDROID=1
and than get the jni_mxnet_predict.so, rename this and replace the one in WhatsThis, the running time became almost 7s

Who can help me to solve this?

Source

ydmo

Most helpful comment

WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:

1) Improve multithreaded performance via a pthreadpool rewrite.
2) Improve 3x3 convolutions via dedicated NEON impl.

Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.

ajtulloch on 29 Dec 2016

👍3

All 4 comments

WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:

1) Improve multithreaded performance via a pthreadpool rewrite.
2) Improve 3x3 convolutions via dedicated NEON impl.

Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.

ajtulloch on 29 Dec 2016

👍3

NNPACK in MXNet now is very limited, please try this pr: https://github.com/dmlc/mxnet/pull/4373 i have test it on PC, which will speed 2x~7x. thansk. @ydmo