If I generate the amalgamation for android without other librarys, the running time of WhatsThis App is about 700ms.
However, when I generate Amalgamation with libnnpack.a or others librarys generate by ndk-build, the running time of WhatsThis became more than 6000ms, some times almost 7s.
here is what I have done:
CFLAGS += -I${MXNET_ROOT} #mxnetroot
CFLAGS += -I${MXNET_ROOT}/dmlc-core/include #mxnetroot/dmlc-core/include
CFLAGS += -I${MXNET_ROOT}/include #mxnetroot/include
CFLAGS += -I${MXNET_ROOT}/mshadow #mxnetroot/mshadow
CFLAGS += -DMXNET_USE_NNPACK=1
CFLAGS += -DMXNET_USE_NNPACK_NUM_THREADS=8
CFLAGS += -I${NNPACK_ROOT}/include
LDFLAGS += -L${NNPACK_ROOT}/obj/local/armeabi-v7a
LDFLAGS += -lnnpack -lpthreadpool -lnnpack_ukernels -lcpufeatures
CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0/include
CFLAGS += -I${NNPACK_ROOT}/third-party/gtest-1.7.0
CFLAGS += -I${NNPACK_ROOT}/third-party/pthreadpool/include
CFLAGS += -I${NNPACK_ROOT}/third-party/FXdiv/include
CFLAGS += -DMSHADOW_STAND_ALONE=1
CFLAGS += -DMSHADOW_USE_CUDA=0
CFLAGS += -DMSHADOW_USE_MKL=0
CFLAGS += -DSHADOW_RABIT_PS=0
CFLAGS += -DMSHADOW_DIST_PS=0
CFLAGS += -DMSHADOW_USE_SSE=0
CFLAGS += -DMXNET_USE_OPENCV=0
CFLAGS += -DMXNET_PREDICT_ONLY=0
CFLAGS += -DDISABLE_OPENMP=1
Who can help me to solve this?
WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:
1) Improve multithreaded performance via a pthreadpool rewrite.
2) Improve 3x3 convolutions via dedicated NEON impl.
Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.
NNPACK in MXNet now is very limited, please try this pr: https://github.com/dmlc/mxnet/pull/4373 i have test it on PC, which will speed 2x~7x. thansk. @ydmo
@ydmo I use same method as you, and modify whatisthis to VGG network and do image segmentation task. I can give 5x speedup.
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!
Most helpful comment
WRT NNPACK, there are a few useful patches for NNPACK (non-MXNet specific) we'll PR shortly:
1) Improve multithreaded performance via a pthreadpool rewrite.
2) Improve 3x3 convolutions via dedicated NEON impl.
Those improve performance vs im2col/sgemm performance (even at batch-size=1 and small channel sizes) pretty significantly.