For few-shot learning tasks like IDCard<->Camera face verification(identification), we only have two face images for each person in most cases for training. Under such situation, metric learning approaches can be tried such as tripletloss.
STEPS:
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_triplet.py --data-dir $DATA_DIR \
--network "$NETWORK" --lr 0.005 --pretrained "$PRETRAINED" --per-batch-size 60
We use GPU to do the semi-hard mining so training will be fast.
RESULTS:
We have a private IDCard/Camera face image dataset with 220K identities. Each person has two or more photos, one from IDCard and the others from camera. Split it as 8-2 for training(176K IDs) and testing(45K IDs). We report top1 accuracy and TAR vs FAR for 1:N identification task(N=45K).
(Note that we do not use idcard training data in Model-A and Model-B)
| | DESC. | Rank-1 | TAR@FAR=1e-3 | TAR@FAR=1e-4 |
| -------- | ---------------------------------------- | -------------------- | ------------ | ------------ |
| Model-A1 | LResNet100E trained on ms1m-v1 with Softmax loss | 26.9% | 0.3% | 0.06% |
| Model-A2 | LResNet100E trained on ms1m-v1 with ArcFace loss | 70.7% | 17% | 8% |
| Model-A3 | LResNet100E trained on ms1m-v2 with ArcFace loss | 76.8% | 21% | 9% |
| Model-B | LResNet100E trained on (ms1m-v2+Glint-Asia) with ArcFace loss | 82.4% | 33% | 16% |
| Model-C | Triplet-loss finetuning on Model-B | 95.2%(still ongoing) | 78% | 26% |
176k seems not a huge number, have you ever tried finetuning using arcface instead of triplet-loss? I think that might achieve better result.
Will you release the emore + Glin-Asia data, or show how to combine the two?
Could you upload a triploss finetune log file? @nttstar
Arcface is a good example of deep learning popularization.
I would like to thank nttstar for working on the arcface code as a deep learning researcher.
I developed a tensorflow learning code based on your mxnet face recognition training code.
I would like to cooperate with you. If you ask, I can send you my tensorflow arcface code. However, I can not open it on github due to some problems.
I am using tensorflow, so it can be inconvenient for you to use mxnet. But I want to cooperate with you.
My skype id is kwakjiwon1986.
If you accept my request, please link me.
Thanks,
Kwak Ji Won
@nttstar Would you mind releasing the training log for Model-B? I am not getting good accuracy while training MobileFaceNet on emore + glint-asia combined dataset.
Thank you very much!
I trained a model similar to Model-B with limited resource (two 2080Ti). Not sure I reach the full potential of the most complicated model. The accuracy on agedb_30 is slightly lower than I like:
testing verification..
(12000, 512)
infer time 23.104655
[lfw][552000]XNorm: 20.384019
[lfw][552000]Accuracy-Flip: 0.99783+-0.00269
testing verification..
(14000, 512)
infer time 27.093829
[cfp_fp][552000]XNorm: 21.291860
[cfp_fp][552000]Accuracy-Flip: 0.98386+-0.00478
testing verification..
(12000, 512)
infer time 23.229146
[agedb_30][552000]XNorm: 21.327194
[agedb_30][552000]Accuracy-Flip: 0.97550+-0.00624
Most helpful comment
Will you release the emore + Glin-Asia data, or show how to combine the two?