Incubator-mxnet: Generate the Amalgamation for android with NNPACK

Created on 29 Dec 2016  路  4Comments  路  Source: apache/incubator-mxnet

If I generate the amalgamation for android without other librarys, the running time of WhatsThis App is about 700ms.

However, when I generate Amalgamation with libnnpack.a or others librarys generate by ndk-build, the running time of WhatsThis became more than 6000ms, some times almost 7s.

here is what I have done:

  1. add these to the makefile in amalgamation:
    export NNPACK_ROOT=${MXNET_ROOT}/../nnpack/NNPACK

    mxnet itself

CFLAGS += -I${MXNET_ROOT} #mxnetroot
CFLAGS += -I${MXNET_ROOT}/dmlc-core/include #mxnetroot/dmlc-core/include
CFLAGS += -I${MXNET_ROOT}/include #mxnetroot/include
CFLAGS += -I${MXNET_ROOT}/mshadow #mxnetroot/mshadow

nnpack:

CFLAGS += -DMXNET_USE_NNPACK=1
CFLAGS += -DMXNET_USE_NNPACK_NUM_THREADS=8
CFLAGS += -I${NNPACK_ROOT}/include
LDFLAGS += -L${NNPACK_ROOT}/obj/local/armeabi-v7a

LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lnnpack_reference -lgtest -lfp16_utils -lbench_utils -lcpufeatures

LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lcpufeatures

nnpack dependence googletest:

CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0/include
CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0

nnpack dependence pthreadpool:

CFLAGS += -I${NNPACK_ROOT}/third-party/pthreadpool/include
CFLAGS += -I${NNPACK_ROOT}/third-party/FXdiv/include

other define used

CFLAGS += -DMSHADOW_STAND_ALONE=1
CFLAGS += -DMSHADOW_USE_CUDA=0
CFLAGS += -DMSHADOW_USE_MKL=0
CFLAGS += -DSHADOW_RABIT_PS=0
CFLAGS += -DMSHADOW_DIST_PS=0
CFLAGS += -DMSHADOW_USE_SSE=0
CFLAGS += -DMXNET_USE_OPENCV=0
CFLAGS += -DMXNET_PREDICT_ONLY=0
CFLAGS += -DDISABLE_OPENMP=1

  1. add these to Amalgamation.py the line 123

    if MXNET_USE_NNPACK == 1

include "src/operator/nnpack/nnpack_convolution-inl.h"

endif // MXNET_USE_NNPACK

  1. run
    make ANDROID=1
    and than get the jni_mxnet_predict.so, rename this and replace the one in WhatsThis, the running time became almost 7s

Who can help me to solve this?

Most helpful comment

WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:

1) Improve multithreaded performance via a pthreadpool rewrite.
2) Improve 3x3 convolutions via dedicated NEON impl.

Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.

All 4 comments

WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:

1) Improve multithreaded performance via a pthreadpool rewrite.
2) Improve 3x3 convolutions via dedicated NEON impl.

Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.

NNPACK in MXNet now is very limited, please try this pr: https://github.com/dmlc/mxnet/pull/4373 i have test it on PC, which will speed 2x~7x. thansk. @ydmo

@ydmo I use same method as you, and modify whatisthis to VGG network and do image segmentation task. I can give 5x speedup.

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

WangcsShuai picture WangcsShuai  路  3Comments

xzqjack picture xzqjack  路  3Comments

JonBoyleCoding picture JonBoyleCoding  路  3Comments

seongkyun picture seongkyun  路  3Comments

qiliux picture qiliux  路  3Comments