jack,
Can you share instructions of how to run mobilenetface with TVM on RK3399?
I have a RK3399 board and TVM is installed already and like to try it.
Thanks,
We will release insightface model deploy tutorial base on TVM Stack soon.
Jack,
Thanks,
this not only mobilenet model right ? we can get benefit also for other model like LResNet50E-IR to use TVM ?
Also is TVM usefull on the raspberry pi 3B+ ?
Best
Jack
When will this be available?
Thanks
The Tutorial has been updated on wiki.
jack,
Can you provide the link of wiki?
Thanks
https://github.com/deepinsight/insightface/wiki/Tutorial:-Deploy-Face-Recognition-Model-via-TVM
target = tvm.target.create("llvm -mcpu=haswell")
What target should be used for Firefly Rk 3399@Mali GPU[fp16]? I have a RK3399 board and like to try fp16 mode.
Using fp16 ,You must compiled runtime with opencl and convert params to fp16.
Here is official tutorial.
https://docs.tvm.ai/tutorials/nnvm/deploy_model_on_mali_gpu.html#sphx-glr-tutorials-nnvm-deploy-model-on-mali-gpu-py
I can run your python script with fp32. But don't know how to convert parameter to fp16. Can you provide the link which do fp16 conversion?
Thanks,
dtype = 'float16'
nnvm_params = {k: tvm.nd.array(v.asnumpy().astype(dtype)) for k, v in nnvm_params.items()}
Thanks,
On Sat, Jan 12, 2019 at 3:55 AM Jack Yu notifications@github.com wrote:
dtype = 'float16'
nnvm_params = {k: tvm.nd.array(v.asnumpy().astype(dtype)) for k, v in
nnvm_params.items()}—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/deepinsight/insightface/issues/475#issuecomment-453741703,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMGg3qjMnb4KbnqdNNmXxXbUsQCl9b3yks5vCc0igaJpZM4ZUIgL
.
@szad670401
I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?
Thanks,
https://github.com/deepinsight/insightface/wiki/Tutorial:-Deploy-Face-Recognition-Model-via-TVM
Hi, I followed your instruction to deploy the agegender model via TVM on my GTX 1080 Ti but got the following error after compiling the model. It could convert to the .so file in the end but I couldn't do the inference upon that. Please let me know what is the issue here and how to fix it. Thank you very much.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 3, 112, 112, 'float32'), (8, 3, 3, 3, 'float32'), (1, 1), (1, 1), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 8, 112, 112, 'float32'), (16, 8, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 16, 56, 56, 'float32'), (32, 16, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 32, 56, 56, 'float32'), (32, 32, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 32, 28, 28, 'float32'), (64, 32, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 64, 28, 28, 'float32'), (64, 64, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 64, 14, 14, 'float32'), (128, 64, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 128, 14, 14, 'float32'), (128, 128, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 128, 7, 7, 'float32'), (256, 128, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=cuda -model=1080ti, workload=('conv2d', (1, 256, 7, 7, 'float32'), (256, 256, 1, 1, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
It is normal on python3
@szad670401
I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?Thanks,
The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.
@szad670401
I have converted the models to TVM.
But i how do i call the get_input and get_feature methods from python in TVM ?
@szad670401
Hi,
I have converted the model with tvm and tested with a knn classifier around 500 images but i could see that the results are not good. the output features are different from the actual model. do you have any idea why this happen?. please suggest.
please check the preprocess and input.
actually i had tested 500 images with a classifier created with single image of 4 person and tested with original model and TVM converted model.below confusion matrix from original model
[[ 87 0 0 0]
[ 0 200 0 0]
[ 0 0 97 0]
[ 0 0 0 128]]
and from converted model
[[46 1 0 23]
[ 1 97 0 0]
[30 81 81 84]
[10 21 16 21]]
and my doubt is not about the classifier result. when we convert the model to TVM is there any possibility that result might get change?
Can you show the demo code..I have checked the infer results from tvm. It achieve very high PSNR(It's almost exactly the same value before the fifth decimal place) which compare with original output from mxnet.
Here are some code that you can test the output of you compiled model simply.
https://github.com/szad670401/tvm_benchmark_cpp
please find the below code and suggest
import nnvm.compiler
import nnvm.testing
import mxnet as mx
prefix,epoch = "model",0
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
image_size = (112, 112)
opt_level = 3
shape_dict = {'data': (1, 3,112, 112)}
target = "cuda"
nnvm_sym, nnvm_params = nnvm.frontend.from_mxnet(sym, arg_params, aux_params)
print type(nnvm_sym)
with nnvm.compiler.build_config(opt_level=opt_level):
graph, lib, params = nnvm.compiler.build(nnvm_sym, target, shape_dict, params=nnvm_params)
lib.export_library("./deploy_lib.so")
print('lib export succeefully')
with open("./deploy_graph.json", "w") as fo:
fo.write(graph.json())
with open("./deploy_param.params", "wb") as fo:
fo.write(nnvm.compiler.save_param_dict(params))
import tvm
from tvm.contrib import graph_runtime
import cv2
from sklearn import preprocessing
class FaceFeatures:
def __init__(self):
ctx = tvm.gpu(0)
loaded_json = open("./deploy_graph.json").read()
loaded_lib = tvm.module.load("./deploy_lib.so")
loaded_params = bytearray(open("./deploy_param.params", "rb").read())
self.module = graph_runtime.create(loaded_json, loaded_lib, ctx)
self.module.load_params(loaded_params)
def get_features(self,face_img=None):
img = face_img[...,::-1]
input_data = tvm.nd.array(img)
self.module.run(data=input_data)
f1 = self.module.get_output(0).asnumpy()
f1 = preprocessing.normalize(f1).flatten()
return f1
obj = FaceFeatures()
img = cv2.imread("test1.jpg").astype("float32")
f1 = obj.get_features(face_img=img)
print f1
[ 1.75104029e-02 -5.63121364e-02 1.80748001e-01 1.11219779e-01
-1.74342334e-01 1.71429180e-02 -9.98420194e-02 9.30923745e-02
4.10846993e-02 -6.40256777e-02 2.27195284e-04 -1.25597119e-01
6.07179999e-02 -1.28629074e-01 -6.06475249e-02 -3.95535417e-02
-4.25101593e-02 5.73790967e-02 -2.98622623e-02 -1.90120172e-02
-3.85310799e-02 -7.33197406e-02 -1.43477228e-02 6.11900613e-02
-2.84178685e-02 1.35797411e-01 2.44035721e-01 -2.71233842e-02
-4.28121611e-02 2.80844029e-02 -1.20349713e-02 -1.64229572e-02
-3.85312526e-03 1.79605037e-01 1.29827776e-03 6.53414652e-02
5.59902005e-02 1.57160446e-01 1.34048775e-01 -6.84935227e-02
-1.61816403e-01 -1.26174288e-02 6.18721507e-02 6.37746369e-03
6.24656305e-02 -4.80552614e-02 2.20586285e-02 -7.91603699e-02
-2.32286751e-01 -9.90498886e-02 -1.06094800e-01 -1.66534394e-01
-3.88872474e-02 7.34967738e-02 -3.18447612e-02 -4.36934344e-02
6.79816585e-05 -1.05723634e-01 -3.27924825e-02 -3.29886861e-02
9.26097259e-02 -1.06216623e-02 9.04475600e-02 -1.10379830e-01
-5.36163338e-02 2.59467922e-02 7.68485367e-02 1.90704335e-02
1.11108817e-01 2.99090222e-02 7.72744194e-02 8.72296467e-02
-1.53241485e-01 -1.52133182e-01 4.62566689e-02 -2.88791209e-03
4.30064350e-02 -6.85409755e-02 -1.33104891e-01 -1.91462226e-02
-1.04227044e-01 2.17016056e-01 1.15007862e-01 -5.48621193e-02
1.60578359e-02 3.20447236e-02 1.67723987e-02 -6.67547761e-03
8.21632594e-02 2.59338077e-02 7.57092685e-02 -2.35254578e-02
-4.87032719e-02 -7.15239346e-02 -4.74523529e-02 -6.45991822e-04
-1.25418548e-02 -1.29672378e-01 -7.29343817e-02 -1.10553257e-01
-5.07143624e-02 -1.03218541e-01 -1.11160256e-01 1.13212936e-01
-8.62115398e-02 6.21878281e-02 6.07950194e-03 -6.37388751e-02
-7.92359468e-03 -3.28040607e-02 1.53760018e-03 3.84959057e-02
1.25529528e-01 5.13212644e-02 2.52638847e-01 -1.51160844e-02
2.04592627e-02 -1.22018419e-01 -6.91621155e-02 -9.11587626e-02
1.18083565e-03 -1.95849203e-02 -1.05052151e-01 7.92575851e-02
-1.46066859e-01 -8.66932943e-02 -1.11908101e-01 4.56388947e-03]
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import os
import numpy as np
import mxnet as mx
import cv2
from sklearn import preprocessing
sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'src', 'common'))
def get_model(ctx, image_size, model_str, layer):
_vec = model_str.split(',')
assert len(_vec)==2
prefix = _vec[0]
epoch = int(_vec[1])
print('loading',prefix, epoch)
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
all_layers = sym.get_internals()
sym = all_layers[layer+'_output']
model = mx.mod.Module(symbol=sym, context=ctx, label_names = None)
model.bind(data_shapes=[('data', (1, 3, image_size[0], image_size[1]))])
model.set_params(arg_params, aux_params)
return model
class FaceModel:
def __init__(self):
ctx = mx.cpu()
image_size = (112,112)
model = "model,0"
self.model = get_model(ctx, image_size, model, 'fc1')
def get_feature(self, aligned):
input_blob = np.expand_dims(aligned, axis=0)
data = mx.nd.array(input_blob)
db = mx.io.DataBatch(data=(data,))
self.model.forward(db, is_train=False)
embedding = self.model.get_outputs()[0].asnumpy()
embedding = preprocessing.normalize(embedding).flatten()
return embedding
obj = FaceModel()
img = cv2.imread("test1.jpg")
img = np.transpose(img, (2,0,1))
f1 = obj.get_feature(img)
print (f1)
[-0.09654074 0.04991252 0.0510361 -0.02417398 -0.0538626 -0.04160581
0.03415876 -0.15723884 -0.12081124 0.14687592 -0.11256409 0.0125411
-0.05264783 -0.08382451 -0.01782615 -0.04520258 0.01114853 -0.06529924
-0.00325403 0.05563293 -0.14722548 0.11442303 0.11916896 0.10032877
0.11199436 0.02308343 -0.07672466 -0.00088169 0.10263881 -0.01020689
0.10686959 0.15684374 0.16578783 0.02380837 0.12687422 0.03083293
0.15459235 -0.06844956 0.07605185 0.07035324 0.05418182 -0.01898195
-0.08577462 0.00150682 -0.12864545 -0.03884511 0.04517042 0.04077131
-0.07930189 0.06352004 -0.15594856 -0.20496422 -0.19894832 0.09672125
-0.131809 -0.07786789 -0.17138474 -0.04978099 0.1648841 0.1046343
-0.03816895 -0.13895725 0.08371405 -0.08430146 0.03474342 -0.13941556
0.10715988 0.04515056 -0.01759461 0.04201001 0.00621118 0.03565768
-0.1260632 0.04461443 0.04684962 0.07612025 0.09897412 0.00796857
0.11144172 -0.18671629 0.16357845 0.06407592 0.10835928 -0.11073305
0.02092392 -0.00301192 0.05545598 0.14673832 0.03537874 0.0757264
0.13249345 -0.01947724 0.07554084 -0.01039375 0.05618145 -0.06694733
-0.11280568 -0.0160348 -0.034942 0.11389426 -0.00342214 0.04541998
-0.06623963 0.09992316 -0.1476111 0.05721403 -0.00503596 -0.04717568
-0.08764981 0.03446406 -0.03462443 -0.00236538 -0.0124135 0.16734649
-0.02889436 0.00913746 -0.03911852 -0.02903078 0.06428095 0.00826436
0.02268215 0.06138279 0.13499816 -0.00530881 -0.05047613 0.06694238
0.01999153 0.07042321]
i had used the aligned preprocessed imgae in both models...
please please suggest.
Here are some code that you can test the output of you compiled model simply.
https://github.com/szad670401/tvm_benchmark_cpp
Could you please share those TVM models
Thanks problem got resolved..the input format was different ..
@aneesh0
How do you resolve input format difference?
My test also has different f1 values between TVM and the original insightface python code.
just transpose your image array..initially it will be (x,y,3) when you read image, then we will be converting into (3,x,y) but you might have transposed into (3,y,x) that can be the problem.
It works and thanks
should i convert the img_array to (0,1) by /255, or no matter; but i get the different feature
Please share the sample code
i'm sorry ,i confused reshape func and transpose func,and i got the right output;
but another question, datashap = (1,3,112,112)
but input with shape =(3, 112, 112) also work, broadcast?
@aneesh0
thank you very much
@szad670401
I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?
Thanks,The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.
can you give me some advice about how to deploy mtcnn using tvm?
@szad670401
I got similar result as yours with float16 on my RK3399.
Have you tried to compile MTCNN with TVM so we can move entire flow to TVM framework?
Thanks,The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.
can you give me some advice about how to deploy mtcnn using tvm?
I also want to know how to do it
@szad670401
Can you update the tvm conversion code because TVM has deprecated the nnvm compiler...
This code does not work on the latest repo.
Most helpful comment
The TVM input size cannot be mutable. Maybe I will give a TVM MTCNN cpp implementation recently when I am free.