Insightface: How to finetune on my own datasets ?

Created on 5 Jun 2018 · 19Comments · Source: deepinsight/insightface

I Fine-tune 'model-r50-am-lfw' on 'faces_emore' datasets,and got a new model v1, but when i try to finetune v1 on my own datasets ,it got a problem like this.
Incompatible attr in node at 0-th output: expected [85742,512], got [12700,512]
i also can finet-une 'model-r50-am-lfw' on my own datasets ,it works.
How to save the model correctly? or How to finetune on my own datasets ?
thank you!

Source

maryhh

Most helpful comment

爱你，谢谢！thank you!

maryhh on 5 Jun 2018

❤4

All 19 comments

Remove the last FC layer before fine-tuning by using deploy/model_slim.py

nttstar on 5 Jun 2018

❤1 👍1

爱你，谢谢！thank you!

maryhh on 5 Jun 2018

❤4

@nttstar hello，I want to ask about how to fix the layers parameter before 'fc7' layer,just learning the fc7 layer,,in this project,,,it seems that the 'fixed_param_names' in module api is not availabe to freeze the parameters. i use the 'fixed_param_names' traing on new dataset,,,the acc is climbed up to 99.% so quickly, and I get a new model,,,and test on lfw,,using the emedding feature ,,,but the result is so bad,,,

the pretrained model can get a good result :

lmmcc on 1 Aug 2018

why fixed_param not available? it's impossible.

nttstar on 1 Aug 2018

@nttstar i just change this code,is it right? or there have some other place need to chage?

i use this code to see the parameter ,the arg_params layer have the same number..i don't know why?
and I find many method,like after emedding layer,,,use the blockgrad ...set the trainable layer's lr to 0,,,,and i use verification to test lfw,,,the result also very bad....i

lmmcc on 1 Aug 2018

I think it's correct. Accuracy may depend on your own dataset.

nttstar on 1 Aug 2018

@nttstar It is this problem puzzled me a few weeks...my test dataset is 'lfw.bin',,I think it is no problem...maybe some where in the model need to be change....thank U all the same.

lmmcc on 2 Aug 2018

try setting a much lower learning rate

nttstar on 2 Aug 2018

👍1

@lmmcc So, did you solve your problem ? Finally, did you only set a much lower learning rate or not only a much lower learning rate but also set fixed_param_names ? Thanks!

bruinxiong on 5 Aug 2018

@nttstar 作者您好，我使用您分享的InsightFace在自己的数据集上训练，遇到一些问题。向您请教下：我使用了预训练模型 ”model-r50-am-lfw“ ，在自己的数据集训练新的模型，发现如下问题：1 -> acc ；2->在lfw，cfp_fp，agedb_30上的验证率都很低。
如下：
testing verification..
(12000, 512)
infer time 16.405921
[lfw][30000]XNorm: 27.570214
[lfw][30000]Accuracy-Flip: 0.63100+-0.02101
testing verification..
(14000, 512)
infer time 19.143502
[cfp_fp][30000]XNorm: 31.248975
[cfp_fp][30000]Accuracy-Flip: 0.61157+-0.02130
testing verification..
(12000, 512)
infer time 16.315097
[agedb_30][30000]XNorm: 77.036209
[agedb_30][30000]Accuracy-Flip: 0.52483+-0.01677

问题1 可能是要要耐心等。
问题2 就有点困惑了。

我制作自己数据集合的方法：
1.iqiyi数据集合（和作者获得rank 1st 的相同数据集）。没秒8帧的解帧视频集。根据”视频-标签”映射txt，把解帧出来的图片合并到对应标签命名的目录内。
2.使用mtcnn处理每一张图片，获得bbox和points，制造lst文件，lst文件的第一列全都置为0.(lst文件内容形如：“0\t img_path\t lable\t bbox\t points\n")这样在face2rec2.py的时候就能调用face_preprocess进行align了。
3.使用face2rec2.py生成rec文件。

我训练的方式为：
方式1：直接在预训练模型上（model-r50-am-lfw），更换数据集为自己的数据集。出现上述问题1， 2；
方式2：看了这个issue，我使用deploy/model_slim.py Remove the last FC layer（model-r50-am-lfw），再在自己的数据集上训练，依然出现上述问题1,2.

所以想请教一下，自己上述步骤是否科学或者出错的地方？如果使用自己的数据，在InsightFace上训练一个新的模型，需要注意的地方

十分感谢作者！

shengmingkai on 26 Dec 2018

@nttstar 作者您好，我使用您分享的InsightFace在自己的数据集上训练，遇到一些问题。向您请教下：我使用了预训练模型 ”model-r50-am-lfw“ ，在自己的数据集训练新的模型，发现如下问题：1 -> acc ；2->在lfw，cfp_fp，agedb_30上的验证率都很低。
如下：
testing verification..
(12000, 512)
infer time 16.405921
[lfw][30000]XNorm: 27.570214
[lfw][30000]Accuracy-Flip: 0.63100+-0.02101
testing verification..
(14000, 512)
infer time 19.143502
[cfp_fp][30000]XNorm: 31.248975
[cfp_fp][30000]Accuracy-Flip: 0.61157+-0.02130
testing verification..
(12000, 512)
infer time 16.315097
[agedb_30][30000]XNorm: 77.036209
[agedb_30][30000]Accuracy-Flip: 0.52483+-0.01677

问题1 可能是要要耐心等。
问题2 就有点困惑了。

我制作自己数据集合的方法：
1.iqiyi数据集合（和作者获得rank 1st 的相同数据集）。没秒8帧的解帧视频集。根据”视频-标签”映射txt，把解帧出来的图片合并到对应标签命名的目录内。
2.使用mtcnn处理每一张图片，获得bbox和points，制造lst文件，lst文件的第一列全都置为0.(lst文件内容形如：“0\t img_path\t lable\t bbox\t points\n")这样在face2rec2.py的时候就能调用face_preprocess进行align了。
3.使用face2rec2.py生成rec文件。

我训练的方式为：
方式1：直接在预训练模型上（model-r50-am-lfw），更换数据集为自己的数据集。出现上述问题1， 2；
方式2：看了这个issue，我使用deploy/model_slim.py Remove the last FC layer（model-r50-am-lfw），再在自己的数据集上训练，依然出现上述问题1,2.

所以想请教一下，自己上述步骤是否科学或者出错的地方？如果使用自己的数据，在InsightFace上训练一个新的模型，需要注意的地方

十分感谢作者！
@nttstar 我看看了 https://github.com/deepinsight/insightface/issues/74#issuecomment-370204250 感觉acc应该在10K后非0，但是我发现在batch = 26520的时候，还是0，就很是困惑了

shengmingkai on 26 Dec 2018

先对齐后再制作rec吧可以看下生成的图片对不对

nttstar on 27 Dec 2018

先对齐后再制作rec吧可以看下生成的图片对不对
@nttstar
我在制作lst文件的时候，第一列是置为0，在face2rec2.py的时候，会调用face_preprocess.preprocess。我专门用图片使用这个函数处理一些图片看看了，结果图片是正确对齐并resize的。

我设置的batch size = 128。其他的配置默认，在预训练模型 ”model-r50-am-lfw“上，使用了我自己的rec文件。但是在训练的时候，就是一直acc为0 在batch = 26520以后还是acc=0。。。

shengmingkai on 27 Dec 2018

@nttstar 作者您好，我给您gmail邮箱发送了想给您请教的问题，如果有可能，希望能回复下:bowtie: 十分感谢！

shengmingkai on 27 Dec 2018

@nttstar 作者您好，我给您gmail邮箱发送了想给您请教的问题，如果有可能，希望能回复下:bowtie: 十分感谢！
你可以参考下这个链接https://zhuanlan.zhihu.com/p/33750684，里面有一段讲到模型部署，说到不需要做对齐/align，说实话，我以前试过一些align的方法，真是惨不忍睹，与其给网络一些错误的学习样本，还不如不给。可以试试
“
3.模型部署.
我们提供了一些脚本, 可参考做模型部署. 值得注意的是输入的图片不需要被对齐, 只需要检测人脸后裁剪就可以.
3.1 进入deploy/文件夹.
3.2 训练或下载训好的模型.
3.3 参考 deploy/test.py 输入一张检测并切割好的面部照片, 返回512维的embedding. 利用上述的 LResNet34-IR模型, 单次推理仅需17毫秒(Intel E5-2660 @ 2.00GHz, Tesla M40).
“