Insightface: Instructions to run TVM on RK3399

Created on 14 Dec 2018 · 33Comments · Source: deepinsight/insightface

jack,

Can you share instructions of how to run mobilenetface with TVM on RK3399?
I have a RK3399 board and TVM is installed already and like to try it.

Thanks,

Source

kaishijeng

Most helpful comment

@szad670401

I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?

Thanks,

The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.

szad670401 on 16 Jan 2019

👍4

All 33 comments

We will release insightface model deploy tutorial base on TVM Stack soon.

szad670401 on 16 Dec 2018

Jack,

Thanks,

kaishijeng on 16 Dec 2018

this not only mobilenet model right ? we can get benefit also for other model like LResNet50E-IR to use TVM ?

Also is TVM usefull on the raspberry pi 3B+ ?

Best

MyraBaba on 19 Dec 2018

Jack

When will this be available?

Thanks

kaishijeng on 21 Dec 2018

The Tutorial has been updated on wiki.

szad670401 on 10 Jan 2019

jack,

Can you provide the link of wiki?

Thanks

kaishijeng on 10 Jan 2019

https://github.com/deepinsight/insightface/wiki/Tutorial:-Deploy-Face-Recognition-Model-via-TVM

szad670401 on 10 Jan 2019

target = tvm.target.create("llvm -mcpu=haswell")
What target should be used for Firefly Rk 3399@Mali GPU[fp16]? I have a RK3399 board and like to try fp16 mode.

kaishijeng on 10 Jan 2019

Using fp16 ,You must compiled runtime with opencl and convert params to fp16.
Here is official tutorial.
https://docs.tvm.ai/tutorials/nnvm/deploy_model_on_mali_gpu.html#sphx-glr-tutorials-nnvm-deploy-model-on-mali-gpu-py

szad670401 on 11 Jan 2019

I can run your python script with fp32. But don't know how to convert parameter to fp16. Can you provide the link which do fp16 conversion?

Thanks,

kaishijeng on 11 Jan 2019

dtype = 'float16'
nnvm_params = {k: tvm.nd.array(v.asnumpy().astype(dtype)) for k, v in nnvm_params.items()}

szad670401 on 12 Jan 2019

Thanks,

On Sat, Jan 12, 2019 at 3:55 AM Jack Yu notifications@github.com wrote:

dtype = 'float16'
nnvm_params = {k: tvm.nd.array(v.asnumpy().astype(dtype)) for k, v in
nnvm_params.items()}

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/deepinsight/insightface/issues/475#issuecomment-453741703,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMGg3qjMnb4KbnqdNNmXxXbUsQCl9b3yks5vCc0igaJpZM4ZUIgL
.

kaishijeng on 12 Jan 2019

@szad670401

I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?

Thanks,

kaishijeng on 14 Jan 2019

https://github.com/deepinsight/insightface/wiki/Tutorial:-Deploy-Face-Recognition-Model-via-TVM

Hi, I followed your instruction to deploy the agegender model via TVM on my GTX 1080 Ti but got the following error after compiling the model. It could convert to the .so file in the end but I couldn't do the inference upon that. Please let me know what is the issue here and how to fix it. Thank you very much.

Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 3, 112, 112, 'float32'), (8, 3, 3, 3, 'float32'), (1, 1), (1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 8, 112, 112, 'float32'), (16, 8, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 16, 56, 56, 'float32'), (32, 16, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 32, 56, 56, 'float32'), (32, 32, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 32, 28, 28, 'float32'), (64, 32, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 64, 28, 28, 'float32'), (64, 64, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 64, 14, 14, 'float32'), (128, 64, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 128, 14, 14, 'float32'), (128, 128, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 128, 7, 7, 'float32'), (256, 128, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 256, 7, 7, 'float32'), (256, 256, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.

gdhdang on 16 Jan 2019

It is normal on python3

szad670401 on 16 Jan 2019

@szad670401

I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?

Thanks,

The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.

szad670401 on 16 Jan 2019

👍4

@szad670401
I have converted the models to TVM.
But i how do i call the get_input and get_feature methods from python in TVM ?

uday60 on 25 Feb 2019

@szad670401

Hi,
I have converted the model with tvm and tested with a knn classifier around 500 images but i could see that the results are not good. the output features are different from the actual model. do you have any idea why this happen?. please suggest.

aneesh0 on 9 Mar 2019

please check the preprocess and input.

szad670401 on 9 Mar 2019

actually i had tested 500 images with a classifier created with single image of 4 person and tested with original model and TVM converted model.below confusion matrix from original model
[[ 87 0 0 0]
[ 0 200 0 0]
[ 0 0 97 0]
[ 0 0 0 128]]

and from converted model

[[46 1 0 23]
[ 1 97 0 0]
[30 81 81 84]
[10 21 16 21]]

and my doubt is not about the classifier result. when we convert the model to TVM is there any possibility that result might get change?

aneesh0 on 9 Mar 2019

Can you show the demo code..I have checked the infer results from tvm. It achieve very high PSNR（It's almost exactly the same value before the fifth decimal place） which compare with original output from mxnet.

Here are some code that you can test the output of you compiled model simply.
https://github.com/szad670401/tvm_benchmark_cpp

szad670401 on 10 Mar 2019

please find the below code and suggest

converting to TVM

import nnvm.compiler
import nnvm.testing
import mxnet as mx

prefix,epoch = "model",0
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
image_size = (112, 112)
opt_level = 3

shape_dict = {'data': (1, 3,112, 112)}
target = "cuda"
nnvm_sym, nnvm_params = nnvm.frontend.from_mxnet(sym, arg_params, aux_params)
print type(nnvm_sym)
with nnvm.compiler.build_config(opt_level=opt_level):
graph, lib, params = nnvm.compiler.build(nnvm_sym, target, shape_dict, params=nnvm_params)
lib.export_library("./deploy_lib.so")
print('lib export succeefully')
with open("./deploy_graph.json", "w") as fo:
fo.write(graph.json())
with open("./deploy_param.params", "wb") as fo:
fo.write(nnvm.compiler.save_param_dict(params))

get features from TVM model

import tvm
from tvm.contrib import graph_runtime
import cv2
from sklearn import preprocessing

class FaceFeatures:
def __init__(self):
ctx = tvm.gpu(0)
loaded_json = open("./deploy_graph.json").read()
loaded_lib = tvm.module.load("./deploy_lib.so")
loaded_params = bytearray(open("./deploy_param.params", "rb").read())
self.module = graph_runtime.create(loaded_json, loaded_lib, ctx)
self.module.load_params(loaded_params)

def get_features(self,face_img=None):
    img = face_img[...,::-1]
    input_data = tvm.nd.array(img)
    self.module.run(data=input_data)
    f1 = self.module.get_output(0).asnumpy()
    f1 = preprocessing.normalize(f1).flatten()
    return f1

obj = FaceFeatures()
img = cv2.imread("test1.jpg").astype("float32")
f1 = obj.get_features(face_img=img)
print f1

[ 1.75104029e-02 -5.63121364e-02 1.80748001e-01 1.11219779e-01
-1.74342334e-01 1.71429180e-02 -9.98420194e-02 9.30923745e-02
4.10846993e-02 -6.40256777e-02 2.27195284e-04 -1.25597119e-01
6.07179999e-02 -1.28629074e-01 -6.06475249e-02 -3.95535417e-02
-4.25101593e-02 5.73790967e-02 -2.98622623e-02 -1.90120172e-02
-3.85310799e-02 -7.33197406e-02 -1.43477228e-02 6.11900613e-02
-2.84178685e-02 1.35797411e-01 2.44035721e-01 -2.71233842e-02
-4.28121611e-02 2.80844029e-02 -1.20349713e-02 -1.64229572e-02
-3.85312526e-03 1.79605037e-01 1.29827776e-03 6.53414652e-02
5.59902005e-02 1.57160446e-01 1.34048775e-01 -6.84935227e-02
-1.61816403e-01 -1.26174288e-02 6.18721507e-02 6.37746369e-03
6.24656305e-02 -4.80552614e-02 2.20586285e-02 -7.91603699e-02
-2.32286751e-01 -9.90498886e-02 -1.06094800e-01 -1.66534394e-01
-3.88872474e-02 7.34967738e-02 -3.18447612e-02 -4.36934344e-02
6.79816585e-05 -1.05723634e-01 -3.27924825e-02 -3.29886861e-02
9.26097259e-02 -1.06216623e-02 9.04475600e-02 -1.10379830e-01
-5.36163338e-02 2.59467922e-02 7.68485367e-02 1.90704335e-02
1.11108817e-01 2.99090222e-02 7.72744194e-02 8.72296467e-02
-1.53241485e-01 -1.52133182e-01 4.62566689e-02 -2.88791209e-03
4.30064350e-02 -6.85409755e-02 -1.33104891e-01 -1.91462226e-02
-1.04227044e-01 2.17016056e-01 1.15007862e-01 -5.48621193e-02
1.60578359e-02 3.20447236e-02 1.67723987e-02 -6.67547761e-03
8.21632594e-02 2.59338077e-02 7.57092685e-02 -2.35254578e-02
-4.87032719e-02 -7.15239346e-02 -4.74523529e-02 -6.45991822e-04
-1.25418548e-02 -1.29672378e-01 -7.29343817e-02 -1.10553257e-01
-5.07143624e-02 -1.03218541e-01 -1.11160256e-01 1.13212936e-01
-8.62115398e-02 6.21878281e-02 6.07950194e-03 -6.37388751e-02
-7.92359468e-03 -3.28040607e-02 1.53760018e-03 3.84959057e-02
1.25529528e-01 5.13212644e-02 2.52638847e-01 -1.51160844e-02
2.04592627e-02 -1.22018419e-01 -6.91621155e-02 -9.11587626e-02
1.18083565e-03 -1.95849203e-02 -1.05052151e-01 7.92575851e-02
-1.46066859e-01 -8.66932943e-02 -1.11908101e-01 4.56388947e-03]

get_features from original mxnet model

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import os
import numpy as np
import mxnet as mx
import cv2
from sklearn import preprocessing
sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'src', 'common'))

def get_model(ctx, image_size, model_str, layer):
_vec = model_str.split(',')
assert len(_vec)==2
prefix = _vec[0]
epoch = int(_vec[1])
print('loading',prefix, epoch)
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
all_layers = sym.get_internals()
sym = all_layers[layer+'_output']
model = mx.mod.Module(symbol=sym, context=ctx, label_names = None)
model.bind(data_shapes=[('data', (1, 3, image_size[0], image_size[1]))])
model.set_params(arg_params, aux_params)
return model

class FaceModel:
def __init__(self):
ctx = mx.cpu()
image_size = (112,112)
model = "model,0"
self.model = get_model(ctx, image_size, model, 'fc1')

def get_feature(self, aligned):
input_blob = np.expand_dims(aligned, axis=0)
data = mx.nd.array(input_blob)
db = mx.io.DataBatch(data=(data,))
self.model.forward(db, is_train=False)
embedding = self.model.get_outputs()[0].asnumpy()
embedding = preprocessing.normalize(embedding).flatten()
return embedding

obj = FaceModel()
img = cv2.imread("test1.jpg")
img = np.transpose(img, (2,0,1))
f1 = obj.get_feature(img)
print (f1)

[-0.09654074 0.04991252 0.0510361 -0.02417398 -0.0538626 -0.04160581
0.03415876 -0.15723884 -0.12081124 0.14687592 -0.11256409 0.0125411
-0.05264783 -0.08382451 -0.01782615 -0.04520258 0.01114853 -0.06529924
-0.00325403 0.05563293 -0.14722548 0.11442303 0.11916896 0.10032877
0.11199436 0.02308343 -0.07672466 -0.00088169 0.10263881 -0.01020689
0.10686959 0.15684374 0.16578783 0.02380837 0.12687422 0.03083293
0.15459235 -0.06844956 0.07605185 0.07035324 0.05418182 -0.01898195
-0.08577462 0.00150682 -0.12864545 -0.03884511 0.04517042 0.04077131
-0.07930189 0.06352004 -0.15594856 -0.20496422 -0.19894832 0.09672125
-0.131809 -0.07786789 -0.17138474 -0.04978099 0.1648841 0.1046343
-0.03816895 -0.13895725 0.08371405 -0.08430146 0.03474342 -0.13941556
0.10715988 0.04515056 -0.01759461 0.04201001 0.00621118 0.03565768
-0.1260632 0.04461443 0.04684962 0.07612025 0.09897412 0.00796857
0.11144172 -0.18671629 0.16357845 0.06407592 0.10835928 -0.11073305
0.02092392 -0.00301192 0.05545598 0.14673832 0.03537874 0.0757264
0.13249345 -0.01947724 0.07554084 -0.01039375 0.05618145 -0.06694733
-0.11280568 -0.0160348 -0.034942 0.11389426 -0.00342214 0.04541998
-0.06623963 0.09992316 -0.1476111 0.05721403 -0.00503596 -0.04717568
-0.08764981 0.03446406 -0.03462443 -0.00236538 -0.0124135 0.16734649
-0.02889436 0.00913746 -0.03911852 -0.02903078 0.06428095 0.00826436
0.02268215 0.06138279 0.13499816 -0.00530881 -0.05047613 0.06694238
0.01999153 0.07042321]

i had used the aligned preprocessed imgae in both models...
please please suggest.

aneesh0 on 11 Mar 2019

Here are some code that you can test the output of you compiled model simply.
https://github.com/szad670401/tvm_benchmark_cpp

Could you please share those TVM models

aneesh0 on 20 Mar 2019

Thanks problem got resolved..the input format was different ..

aneesh0 on 20 Mar 2019

@aneesh0

How do you resolve input format difference?
My test also has different f1 values between TVM and the original insightface python code.

kaishijeng on 23 Mar 2019

just transpose your image array..initially it will be (x,y,3) when you read image, then we will be converting into (3,x,y) but you might have transposed into (3,y,x) that can be the problem.

aneesh0 on 23 Mar 2019

It works and thanks

kaishijeng on 23 Mar 2019

should i convert the img_array to (0,1) by /255, or no matter; but i get the different feature

xiang-zhe on 22 Apr 2019

Please share the sample code

aneesh0 on 22 Apr 2019

i'm sorry ，i confused reshape func and transpose func，and i got the right output;
but another question， datashap = （1，3，112，112）
but input with shape =（3， 112， 112） also work， broadcast？
@aneesh0
thank you very much

xiang-zhe on 24 Apr 2019

@szad670401
I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?
Thanks,

The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.

can you give me some advice about how to deploy mtcnn using tvm?

tianxingyzxq on 27 Apr 2019

@szad670401
I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?
Thanks,

The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.

can you give me some advice about how to deploy mtcnn using tvm?

I also want to know how to do it