msra is a cleaned subset of MS1M from glint while celebrity is the asian dataset.src/data/glint2lst.py. For example:python glint2lst.py /data/glint_data msra,celebrity > glint.lst
or generate the asian dataset only by:
python glint2lst.py /data/glint_data celebrity > glint_cn.lst
src/data/dataset_merge.py without setting param _model_ which will combine all IDs from those two datasets.Finally you will get a dataset contains about 180K IDs.
Use src/eval/gen_glint.py to prepare test feature file by using pretrained insightface model.
You can also post your private testing results here.
Thanks for Sharing
@nttstar will you train new models with these data?
业界良心
有格林的人在吗,我是楼下比特大陆的,下载速度太慢了,可以直接去楼上直接拷贝吗?
is there exist any same person in msra & celebrity datasets?
前几天刚听了格林的讲座,公开了这个数据集,数据集刚下载好,几百个GB,没想到这里这么快就出现了,感谢
After test, this dataset is pretty clean, but still containing 0.3%~0.8% noise.
Also, we found their ms1m and Asian parts still have about 15-30 overlaps, though I guess it doesn't matter when the scale is already so large.
Another findings is that this dataset suffers long tail a lot. Take the asian part for example, only 18K identites out of 10K have over 25 images per class, and only few thousand identities have over 60 images.
@aa12356jm could you share it on BaiduYun?
@nttstar I download the dataset from glint, it looks like the face is similarity transformed and resized to 400x400, so for arcface, how to crop/resize this to 112x112?
@zhenglaizhang I already provided the scripts.
@JianbangZ do you have some idea to solve these problems?
awesome !
Thanks DeepGlint!
有意义
@nttstar the download address is crashed.
@nttstar @JianbangZ how can you download glint asian face dataset? I can not find how to register and signup.
@nttstar @JianbangZ 为何我下载的亚洲人脸数据集只能解压出1.7G 2000+id的人脸 这个90+G的.tar.gz文件该怎么处理呢 能否指导一下 多谢
there is no lmk files in the dataset:
"lmk_file = os.path.join(input_dir, "%s_lmk.txt"%(ds))"
is it correct?
The same problem with @libohit , I can't sign in http://trillionpairs.deepglint.com/data, the button of "sign in" is dark!
@Wisgon maybe you need to use another browser
Can anyone share a copy of lmk files? Their official site seems to be maintaining. I cound download nothing.
现在下载不了,是什么情况@—@
I can't download the dataset, when I click the Download button, there is some error appear:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error><Code>InvalidAccessKeyId</Code><Message>The OSS Access Key Id you provided is disabled.</Message><RequestId>5B31AE6FF68A5D785875635D</RequestId><HostId>dgplaygroundopen.oss-cn-qingdao.aliyuncs.com</HostId><OSSAccessKeyId>LTAIKdTReMdV71Zi</OSSAccessKeyId></Error>
I can't sign in http://trillionpairs.deepglint.com/data, the button of "sign in" is dark!
@shineway14 You can use http://trillionpairs.deepglint.com/login to sign in, when you finsh fill in the blanks, press enter instead of the 'log in' button.
BTW, http://trillionpairs.deepglint.com/register to register
@nttstar what is the exactly script to merge msra and celeb?
@aaaaaaaak 我最新用BT下载的亚洲人脸数据集是正常的,和官方提供的数据一致,我给你参考下我的
目录大小 98G ./asian-celeb
人数 ls -lR| grep "^d" | wc -l 93979
图片数 ls -lR |grep "^-"| grep ".jpg" |wc -l 2830146
HI all,
I have already done the step1, which is to get glint_cn file; however, I got a error while trying to do step 2. The error code is following and please please help me to fix this issue. Thanks.
OpenCV Error: Assertion failed (src.cols > 0 && src.rows > 0) in warpAffine, file /build/buildd/opencv-2.4.8+dfsg1/modules/imgproc/src/imgwarp.cpp, line 3445
Traceback (most recent call last):
File "face2rec2.py", line 256, in <module>
image_encode(args, i, item, q_out)
File "face2rec2.py", line 99, in image_encode
img = face_preprocess.preprocess(img, bbox = item.bbox, landmark=item.landmark, image_size='%d,%d'%(args.image_h, args.image_w))
File "../common/face_preprocess.py", line 107, in preprocess
warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0)
cv2.error: /build/buildd/opencv-2.4.8+dfsg1/modules/imgproc/src/imgwarp.cpp:3445: error: (-215) src.cols > 0 && src.rows > 0 in function warpAffine
@jackytu256 请问你下载的包里面有lmk文件么?
lmk_file = os.path.join(input_dir, "%s_lmk"%(ds))
@anguoyang
Yes, I got a file called celebrity_lmk
您好,谢谢您将程序开源,但是现在我遇到这样一个问题:因为我要吧msra和celebrity数据合并在一起,但是我发现于微软的celebrity数据库好像没有property文件?期待您的解答,谢谢
property自己写一下, 格式: <数据集总人数>,112,112
各位有谁上传结果成功过的吗?上传了好几次,都没有结果
@wangchust 换一个浏览器?我用的chrome可以。
@TopcoderX 请问上传就只需要上传自己的bin文件吗,还有结果是在哪里看啊:)
@wangchust 对上传bin文件即可,结果 results页面就可以看到。
@TopcoderX 谢谢!
How can I improve my training accuracy?
有人分享训练的情况吗?下面是我的
测试的准确率不错,但是训练的准确率现在上不去了,接近60%,如下:
INFO:root:Epoch[14] Batch [16000] Speed: 499.24 samples/sec acc=0.583464
INFO:root:Epoch[18] Batch [8380] Speed: 499.95 samples/sec acc=0.596615
INFO:root:Epoch[11] Train-acc=0.500434
INFO:root:Epoch[12] Train-acc=0.570312
INFO:root:Epoch[13] Train-acc=0.575087
INFO:root:Epoch[14] Train-acc=0.578993
INFO:root:Epoch[15] Train-acc=0.584201
INFO:root:Epoch[16] Train-acc=0.582465
INFO:root:Epoch[17] Train-acc=0.595920
1初始lr=0.1,没有设置lr_step, 但看到日志里有lr_steps [133333, 186666, 213333],目前学习率已经降到了0.0001,按我的理解继续训练下去的,训练准确率也不会提升了,不知道这样理解对不?
2 目前情况下,我该如何提升训练准确率了?更换更小的学习率继续训练?需要更换为TripletLoss吗?
3 有必要继续再提升训练准确率吗?
For those people who cannot login the web site, you can change to chrome for login.
face_emore 和 celebrity 能否放在一起训练?两个数据集有交集吗?(同一人ID不同)
@zhaowwenzhong 可以 直接合并即可.
在用tripletloss 做finetune 时,我发现输出日志中“call reset”,这是否正常??
call reset()
eval 4200 images.. 12600
triplet time stat [0.00022899999999999998, 27.907889, 5.54027, 0.0, 0.0, 0.0]
found triplets 1873
seq len 5550
INFO:root:Epoch[0] Batch [30] Speed: 124.86 samples/sec lossvalue=0.185821
INFO:root:Epoch[0] Batch [32] Speed: 931.34 samples/sec lossvalue=0.058671
INFO:root:Epoch[0] Batch [34] Speed: 648.38 samples/sec lossvalue=0.066276
INFO:root:Epoch[0] Batch [36] Speed: 638.66 samples/sec lossvalue=0.058656
INFO:root:Epoch[0] Batch [38] Speed: 636.64 samples/sec lossvalue=0.067309
call reset()
eval 4200 images.. 16800
triplet time stat [0.00036899999999999997, 34.373402, 7.256107, 0.0, 0.0, 0.0]
found triplets 1731
seq len 5100
INFO:root:Epoch[0] Batch [40] Speed: 124.52 samples/sec lossvalue=0.188634
INFO:root:Epoch[0] Batch [42] Speed: 673.45 samples/sec lossvalue=0.078769
INFO:root:Epoch[0] Batch [44] Speed: 645.02 samples/sec lossvalue=0.061025
INFO:root:Epoch[0] Batch [46] Speed: 628.74 samples/sec lossvalue=0.063693
call reset()
eval 4200 images.. 21000
triplet time stat [0.000468, 41.096862, 9.061286, 0.0, 0.0, 0.0]
found triplets 1864
seq len 5550
INFO:root:Epoch[0] Batch [48] Speed: 118.35 samples/sec lossvalue=0.202036
INFO:root:Epoch[0] Batch [50] Speed: 681.18 samples/sec lossvalue=0.073124
INFO:root:Epoch[0] Batch [52] Speed: 642.58 samples/sec lossvalue=0.074811
INFO:root:Epoch[0] Batch [54] Speed: 640.49 samples/sec lossvalue=0.067620
call reset()
data iter重置的输出 忽略即可
It's too slow downloading, next time, please dont use jar file.
@goodpp can you share the model?
请问合并msra和celebrity的数据集能得到多少identity呢,我合并只得到了不到10万人.
celebrity 93979 个id
msra 85164个id
python /insightface/src/data/dataset_merge.py --include ~/data/celebrity/ , ~/data/msrc/ --output ~/data/combined/
我看代码里是有一个去重过程的,所以我想问一下,根据您设置的阈值,我得到这个合并后的数据集(id已去重)的大小应该是对的吧?
期待您的解答,万分感谢!
作者不是说“ 可以 直接合并即可.” 嘛!!!
直接合并我的理解事把两个集的数据直接以ID区分就可以了。
@nttstar
@zhaowwenzhong 直接合并不是用/insightface/src/data/dataset_merge.py这个脚本来合并吗?我合并的时候celebrity和msra发现:在celebrity的基础上只增加了不到1000个id,合并时采用的阈值是默认的。
model参数留空 直接合并
" 直接合并" 是 不是可以这样做
数据集:celebrity的ID是86876->180854(查看celebrity_lmk)
数据集:msra的ID是0->86875
两个数据集放一起,以ID区分每个人。(我目前是这样做的,没有用到dataset_merge.py,不知道这样做对不对???,我目前还在用这些数据训练过程中,还不知结果如何)
@nttstar
@YunYang1994
@zhaowwenzhong
用dataset_merge.py脚本合并,令model=‘’
I tried to fine tune with triplet:
CUDA_VISIBLE_DEVICES='0,1,2' python -u train.py --network r50 --loss-type 12 --lr 0.005 --mom 0.0 --per-batch-size 96 --data-dir /data/glint_train/ --pretrained /data1/models/model-r50,1 --prefix /data2/models/model-m1-triplet
but got following error:
gpu num: 3
num_layers 50
image_size [112, 112]
num_classes 180855
Called with argument: Namespace(batch_size=288, beta=1000.0, beta_freeze=0, beta_min=5.0, c2c_mode=-10, c2c_threshold=0.0, center_alpha=0.5, center_scale=0.003, ckpt=1, coco_scale=9.052722677456407, ctx_num=3, cutoff=0, data_dir='/data/glint_train/', easy_margin=0, emb_size=512, end_epoch=100000, gamma=0.12, image_channel=3, image_h=112, image_w=112, images_per_identity=5, incay=0.0, logits_verbose=0, loss_type=12, lr=0.005, lr_steps='', margin=4, margin_a=0.0, margin_b=0.0, margin_m=0.5, margin_s=64.0, margin_verbose=0, max_steps=0, mom=0.0, network='r50', noise_sgd=0.0, num_classes=180855, num_layers=50, output_c2c=0, patch='0_0_96_112_0', per_batch_size=96, per_identities=19, power=1.0, prefix='/data2/models/model-m1-triplet', pretrained='/data1/models/model-r50,1', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', train_limit=0, triplet_alpha=0.3, triplet_bag_size=3600, triplet_max_ap=0.0, use_deformable=0, use_val=False, verbose=2000, version_act='prelu', version_input=1, version_output='E', version_se=0, version_unit=3, wd=0.0005)
loading ['/data1/models/model-r50', '1']
[19:17:40] src/engine/engine.cc:55: MXNet start using engine: ThreadedEnginePerDevice
init resnet 50
0 1 E 3 prelu
INFO:root:loading recordio /data/glint_train/train.rec...
header0 label [6753546. 6934401.]
id2range 180855
0 0 6753545
c2c_stat [0, 180855]
6753545
rand_mirror 1
5 19 3
(288,)
oseq 822654
lr_steps [71111, 106666, 142222]
/usr/lib/python2.7/site-packages/mxnet/module/base_module.py:490: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.333333333333 vs. 0.00347222222222). Is this intended?
optimizer_params=optimizer_params)
call reset()
eval 3600 images.. 0
triplet time stat [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Traceback (most recent call last):
File "train.py", line 1062, in <module>
main()
File "train.py", line 1059, in main
train_net(args)
File "train.py", line 1053, in train_net
epoch_end_callback = epoch_cb )
File "/usr/lib/python2.7/site-packages/mxnet/module/base_module.py", line 506, in fit
next_data_batch = next(data_iter)
File "/root/work/insightface/src/data.py", line 1010, in next
ret = self.cur_iter.next()
File "/root/work/insightface/src/data.py", line 860, in next
self.reset()
File "/root/work/insightface/src/data.py", line 726, in reset
self.triplet_reset()
File "/root/work/insightface/src/data.py", line 575, in triplet_reset
self.select_triplets()
File "/root/work/insightface/src/data.py", line 528, in select_triplets
label[i-ba][:] = header.label
File "/usr/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 444, in __setitem__
self._set_nd_basic_indexing(key, value)
File "/usr/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 706, in _set_nd_basic_indexing
value = np.broadcast_to(value, shape)
File "/usr/lib64/python2.7/site-packages/numpy/lib/stride_tricks.py", line 173, in broadcast_to
return _broadcast_to(array, shape, subok=subok, readonly=True)
File "/usr/lib64/python2.7/site-packages/numpy/lib/stride_tricks.py", line 128, in _broadcast_to
op_flags=[op_flag], itershape=shape, order='C').itviews[0]
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (1,)
Any idea about this? Thanks
@cysin rec 生成工具要改一下。(通过lst生成rec #265)
| Epoch[0] | Epoch[1] | Epoch[2] | Epoch[3] | Epoch[4] | Epoch[5] | Epoch[6] | Epoch[7]
-- | -- | -- | -- | -- | -- | -- | -- | --
agedb_30 | 65.62+3.64 | 86.88+2.26 | 76.05+1.56 | 50.00+0.00 | 50.00+0.00 | 79.88+2.00 | |
cfp_ff | 81.24+2.36 | 97.14+0.74 | 94.29+1.63 | 50.11+0.12 | 50.09+0.15 | 87.94+2.24 | |
cfp_fp | 65.69+1.44 | 77.03+2.68 | 71.76+0.62 | 50.07+0.26 | 50.00+0.00 | 66.54+1.92 | |
lfw | 81.63+1.86 | 97.67+1.01 | 94.42+1.09 | 50.18+0.30 | 50.00+0.00 | 90.53+1.10 | |
Train-acc | 0.028951 | 0.050758 | 0.061312 | 0.066338 | 0.073008 | 0.078673 | |
从以上测试结果看,随着训练epoch的增加,测试精度在降低,比如lfw:81.63->97.67->94.42->50.18 ->50.00->90.53,但训练精度在提高,这种现象是不是过学习了,或者是否哪里有问题??我该尝试调整哪些参数??
训练数据主要是msra+celebrity(每人照片数>=3张,大约18万人)
@zhaowwenzhong Did you mean the rec format used for triplet training is different from the one used for softmax training?
I fine-tune on asian celebrity dataset, using the command below:
export MXNET_CPU_WORKER_NTHREADS=24
export MXNET_CUDNN_AUTOTUNE_DEFAULT=0
export MXNET_ENGINE_TYPE=ThreadedEnginePerDevice
NETWORK=r50
JOB=asian
MODELDIR="../model-$NETWORK-$JOB"
mkdir -p "$MODELDIR"
PREFIX="$MODELDIR/model-asian"
LOGFILE="$MODELDIR/log"
CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py
--network "$NETWORK"
--loss-type 0
--lr 0.005
--per-batch-size 64
--data-dir ../datasets/faces_asian_112x112
--pretrained ../models/model-r50-am-lfw/model,0000
--prefix "$PREFIX"
but I get the folowing warning:
UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.5 vs. 0.0078125). Is this intended?
optimizer_params=optimizer_params)
Do you know what it means?@nttstar
@nttstar 您好,我将这两个数据集合并在一起得到约18万个ID,然后用您训练好的模型去抽取每个ID的中心特征向量,并计算两两之间的向量距离(cos值),并把大于0.85的ID抽取出来,结果发现有6417个ID对,即一共有6417对其余弦距离大于0.85的ID。我用人眼和百度识图大致过了下, 发现的确是同一个人。这就意味着合并得到的数据集里可能有一些ID重复了,目前我这里得到的是6417对ID重复了。
我不确定是我做法有误,还是数据集里本身不太干净?
@meanmee @406747925 @zhaowwenzhong
@YunYang1994 不干净是可能的 你可以尝试对比下去重后训练和直接合并有什么区别. 我觉得差距应该几乎可以忽略.
@nttstar 你好,请问一下我使用1080TIx4为啥速度只有30samples/sec,最高也就60samples/sec,看大家的速度起码有300到1000张/sec。请问训练速度很慢的原因可能是什么?谢谢
@YunYang1994 每个ID的中心特征向量是怎么获取的
@nttstar 我看描述提供的是align的图片,有非align的图片集么?谢谢!
@zchflyer 里面有源码,计算每个Id的中心向量即可
Why do I get such low results(Identification is only 0.01270) on TrillionTairs of Glint? Maybe I did not generate the correct result. I use the code src/eval/gen_glint.pyto get the bin file for submits. But maybe the code can not to ues directly, I modify it as follow:
The original code in gen_glint.py:
image_path, label, bbox, landmark, aligned = face_preprocess.parse_lst_line(line)
buffer.append( (image_path, landmark) )
The original code in src/common/face_preprocess.py:
def parse_lst_line(line):
vec = line.strip().split("\t")
assert len(vec)>=3
aligned = int(vec[0])
image_path = vec[1]
label = int(vec[2])
bbox = None
landmark = None
#print(vec)
if len(vec)>3:
bbox = np.zeros( (4,), dtype=np.int32)
for i in xrange(3,7):
bbox[i-3] = int(vec[i])
landmark = None
if len(vec)>7:
_l = []
for i in xrange(7,17):
_l.append(float(vec[i]))
landmark = np.array(_l).reshape( (2,5) ).T
#print(aligned)
return image_path, label, bbox, landmark, aligned
I modify the gen_glint.py to:
image_path, landmark = face_preprocess.parse_lst_line(line)
image_path = "/to/my/path/TrillionPairs/testdata/"+line.split(" ")[0]
buffer.append( (image_path, landmark) )
and modify the src/common/face_preprocess.py to:
def parse_lst_line(line):
vec = line.strip().split(" ")
assert len(vec)>=2
image_path = vec[0]
landmark = None
#print(vec)
if len(vec)>2:
_l = []
for i in xrange(1,11):
_l.append(float(vec[i]))
landmark = np.array(_l).reshape( (2,5) ).T
#print(aligned)
return image_path, landmark
My input is:
--input='/to/my/path/TrillionPairs/testdata/testdata_lmk/testdata_lmk.txt'
Because the input testdata_lmk.txt format is:
testdata/00/00/00000d7e95948372025bdaca5a203832.jpg 153.4 180.0 246.6 180.0 196.8 215.8 158.5 278.7 230.6 277.6
testdata/00/00/00000f9f87210c8eb9f5fb488b1171d7.jpg 156.1 180.0 243.9 180.0 207.4 229.2 159.8 262.9 237.4 263.0
testdata/00/00/000010e4c136b77a07eeeea84d84d804.jpg 156.4 180.0 243.6 180.0 201.6 223.0 168.0 264.7 237.7 268.0
So I think that my modify is right, and I got the result size of bin file about 1.8G.
I don't know what's wrong with it, if someone can find my problem or provide available code directly?
Any help will be grateful! @nttstar
@nttstar When I run src/eval/gen_glint.py, I observe that the memory used is constantly increasing which is weired, is that normal? And another question, what does the following line mean? https://github.com/deepinsight/insightface/blob/master/src/eval/gen_glint.py#L131
When I run the code, I got following error:
sh: 1: bypy: not found
Please help me out, thank you very much.
@becauseofAI I have the same problems with you, my network training accuracy gets 0.82 while using whole deepglint datasets, however, I submit my result and get 0.016 identity results. @nttstar
@yhw-yhw How did you get the result file of .bin? If you use the code src/eval/gen_glint.pyalso, did you modify it somewhere? And do you know what the file of Trillion Pairs/testdata/feature_tools/matio.pydownloaded with the Dataset is for?
@nttstar I use the same modify with @becauseofAI to generate the result through using the model of LResNet100E-IR|Emore in Model-Zoo, but only gets 0.00178 identity results. Can you test with it and share you result and code with us?
@becauseofAI @yhw-yhw @AaronYKing Can you give us a complete right way to generate the correct submit file? I'm sorry that recently I have no time to test it. Thanks~
有人测试出结果吗?我把bin上传到deepglint官网,传完数据就卡死在那个页面了,没有反应,在result界面也没有结果。求助大家,该怎么操作。
@nttstar @becauseofAI @yhw-yhw I have the same problems, I submit my result and get 0.01465 identity results.
I have not modified the code, My step is :
@Edwardmark 我也遇到过,注销下重新登录就好了,如果还没有可能是数据没传成功。
@nttstar @yhw-yhw @AaronYKing I have upload the code to generate submit file on GoogleDrive. You need to put it in the directory of insightface/src/eval/and you can use the model of LResNet100E-IR|Emore in Model-Zoo to generate the submit file. But maybe the code have something with wrong, It only gets 0.00178 identity results.
Anyone who can check the code to solve the problom will be grateful!
@nttstar could glint provide the original non-aligned face data set?
我测试了在celebrity数据集上微调的结果,结果见下,应该是测对了,但是效果很差。。。
verification@1e-9:
identification@1e-3:
代码就使用@nttstat给出的,不需要任何修改。
@yhw-yhw 多谢,请问你有遇到如下的问题吗?下面这行代码是什么意思?https://github.com/deepinsight/insightface/blob/master/src/eval/gen_glint.py#L131
When I run the code, I got following error:
sh: 1: bypy: not found
Please help me out, thank you very much.
@cysin@zhaowwenzhong 生成triplet训练数据的rec要改rec的格式吗?我也遇到了如下问题:
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (1,)
@Edwardmark bypy 是百度云上传的脚本命名,可以不用管的;
另外我在使用triplet loss训练glint data时,也遇到这样的问题, 这是两个数据集生成rec格式不同;
解决方法:
修改 data.py 中 528,529行:
https://github.com/deepinsight/insightface/blob/master/src/data.py#L528
label[i-ba][:] = header.label
tag.append( ( int(header.label), _idx) )
为
label[i-ba][:] = header.label[0]
tag.append( ( int(header.label[0]), _idx) ) 即可;
@yhw-yhw @nttstar 好的,还有个问题就是直接运行:
python glint2lst.py /data/glint_data msra,celebrity > glint.lst 生成两个数据集合并起来的list,然后运行:
python src/data/face2rec2.py ${path-to-glint-data-and-glint-lst}(该路径下包含glint.lst),rec文件不是已经生成完了吗?还需要运行merge.py吗?为什么要运行merge.py呢?
@yhw-yhw 十分感谢,还有就是我不太明白为何要运行merge.py合并数据,运行python glint2lst.py /data/glint_data msra,celebrity > glint.lst,不是就生成了list了吗?直接根据这个list生成rec不就好了吗?为啥还要合并呢?
这是合并其他数据集用的
@nttstar多谢您的耐心回复,十分感谢。
@nttstar @yhw-yhw 请问您一下,对于这种18万类的分类,使用softmax loss以及其改进训练或者微调是不是很难达到较好的效果呢?对于该问题,是否应该直接使用triplet loss在glint数据上微调即可呢?我使用arcface的损失函数在r50模型上对glint数据微调了一天,发现训练准确率一直在0附近,测试准确率也下降很多。希望跟您讨论一下对这种类别很多的多分类任务有什么较好的方法。
@Edwardmark 用arcface loss对作者release的r50模型再glint数据集上进行finetune,我目前训练了10w iteration,使用的lr 是 0.0001,目前acc为0.55; 这个lr还需要再调,我的经验是lr的选取很重要,一般在很多id数据集上训练,先用 0.01训练10w iteration,再用0.001训练20w iteration,再用0.0001训练10w iteration基本上就能到一个非常好的结果,最后0.00001训练一段时间,acc就不会变,我的batch size是128;
@yhw-yhw 多谢,我之前在作者的r50上仅使用celebrity fine-tune过,acc在58%,上传glint官网仅有18%的准确率,请问您训练的模型有上传glint测试吗?结果如何?
@Edwardmark 我使用整个数据集训练时,finetune r50,training acc为50%, glint测试结果也只有 16%,好多人都遇到这样的问题,很奇怪,目前我在分别使用ms1m,celebrity训练测试下。
@yhw-yhw ,您好,我使用triplet fine-tune的时候,将batch设为120,使用4卡训练,出现下面错误,请问您遇到过吗?
Traceback (most recent call last):
File "train.py", line 1062, in
main()
File "train.py", line 1059, in main
train_net(args)
File "train.py", line 1053, in train_net
epoch_end_callback = epoch_cb )
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/base_module.py", line 506, in fit
next_data_batch = next(data_iter)
File "/mntML/dongbin/insightface/src/data.py", line 1011, in next
ret = self.cur_iter.next()
File "/mntML/dongbin/insightface/src/data.py", line 861, in next
self.reset()
File "/mntML/dongbin/insightface/src/data.py", line 727, in reset
self.triplet_reset()
File "/mntML/dongbin/insightface/src/data.py", line 576, in triplet_reset
self.select_triplets()
File "/mntML/dongbin/insightface/src/data.py", line 545, in select_triplets
embeddings[ba:bb,:] = net_out
ValueError: could not broadcast input array from shape (480,512) into shape (240,512)
有人在glint训练后,提交结果比较好的吗?我不管怎么训练在测试集上效果很好,但是上传上去却结果很差,不知道是什么原因。@nttstar@becauseofAI@yhw-yhw,今天换用triplet loss 微调后,结果从18%上涨到20%,但是还是不如不进行调节的r50模型,r50模型我上传上官网获得了48%左右的准确率。
We will try to ask Glint to check recent test results soon
Does this dataset is better than the one provide by @nttstar ?
@nttstar 大侠,请问有向glint反映吗?是代码问题还是说确实是训练效果太差呢?
有人在glint训练后,提交结果比较好的吗?我不管怎么训练在测试集上效果很好,但是上传上去却结果很差,不知道是什么原因。@becauseofAI@yhw-yhw,今天换用triplet loss 微调后,结果从18%上涨到20%,但是还是不如不进行调节的r50模型,r50模型我上传上官网获得了48%左右的准确率。
@Edwardmark 你获得48%的准确率用的模型直接是它这里提供的LResNet50E-IR吗?用来生成bin 的代码也是gen_glint.py吗?还是做了别的什么改进?
我用LResNet50E-IR模型提的特征上传只有31%的准确率.
@xsr-ai 你做完平衡还有15W?我看了一下ac_glint 长尾很严重,大部分图片都只有几张20张
@xsr-ai 数据均衡这块怎么弄
@nttstar 您好, 请问您训练集上的accuracy大概能达到多少呢?因为我发现我训练集上的准确率很低, 但是lfw上的准确率很高。
@nttstar @yhw-yhw 请问一下,有人训过triplit loss吗?为啥感觉完全不收敛啊
请问一下,有人训过triplit loss吗?为啥感觉完全不收敛啊。虽说是online-hard-negtive-mining,但是总得有个整体的趋势吧?感觉一直不降啊,有啥好办法吗?
@Edwardmark @yhw-yhw 请问下你们解决了自己训练的模型在glint训练后测试结果不好的问题了吗?
我试了好多次自己的模型就是不行。。。, 自己glint训练的模型在其他测试包括megaface上都没有问题
|Name | TPR@FPR=1e-3 | metric |
| -------------- |:------------------:| ------: |
| 基准demo | 0.43883 | cos |
| Pretrained r34 | 0.49736 | cos |
| Pretrained r50 | 0.49473 | cos |
| 自己glint_r34 | 0.01465 | cos |
| 自己MS1M_r34 | 0.50138 | cos |
我生成测试文件的步骤都是一样的,只是模型不一样
@goodpp I solved it by use triplet loss.我微调后测试结果为48%,比原来的有所下降,但是还算正常。
@Edwardmark 谢谢,我也试试
求助:acc只有0.24左右
我是在以前训练的模型下用celebrity进行微调的。以前训练时acc在0.53左右,但是微调训练时acc只有0.22.三个测试集LFW ageDB CF-P和以前相近,请问这该怎么办呢?
命令:
CUDA_VISIBLE_DEVICES='1,0' python -u train_softmax.py --network y1 --ckpt 2 --loss-type 4 --lr 0.001 --lr-steps 55000,85000,100000,110000 --wd 0.00004 --fc7-wd-mult 10 --emb-size 128 --per-batch-size 128 --margin-s 128 --data-dir ../datasets/faces_glint_112x112 --pretrained ../models/MobileFaceNet_glint/model-y1-arcface_V2,0042 --prefix ../models/MobileFaceNet_glint/model-y1-arcface_V2
求助,LFW 精度 98.9%,有点低。
我采用 celebrity 数据集从零开始训练,network 采用 mobilenetv2 , Loss 函数采用 arcface
命令:
LRSTEPS='32000,48000,56000'
CUDA_VISIBLE_DEVICES='4,6' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 256 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 > "$LOGFILE" 2>&1 &
@nttstar
LFW 精度低是不是与数据集少而且分布不均衡以及学习率的设置有关系,想问一下这块的超参数如何设置比较合理。
用正常wd, mobilenetv2的实现可能有问题
@nttstar
没有太理解你的意思,需要再麻烦你一下?
你提到的 正常的 wd 和 wd = 0.00004 有什么区别?
mobilenetv2的实现可能有问题是指那一块?我之前也是采用上述参数训练 MS(你在 github 上提供的 rec文件) 数据集训练精度能够达到 99.25%。
Anybody knows how to use dataloader to improve gpu utility?
msra 数据集在哪儿下载啊?
license?
@nttstar Which script is used to resize the images in Asia celebrity to 112 by 112? It seems that face2rec2.py has processed it with face_preprocess.preprocess(img,bbox,...). So, we need not to resize these images alone?
GT of glint-challenge was updated. See http://trillionpairs.deepglint.com/results
do you have Face Alignment models?
@JianbangZ
Thanks for reporting these overlaps.
Would you please share with us, how do you find overlapping and noise images between MS1M and Asian datasets? Have you done it manually or automatically?
@test4fest automatically + manually. What we did is calculating the embedding clustering center for each identity for each dataset. and then do a center-to-center similarity/distance calculating. Then you can set a threshold to automatically find some overlaps, and use a higher thresh and manually check some unsure ones
@JianbangZ
Is it possible that I use a pre-trained network output to calculate the embedding?
Or I have to train a new model based on these combined datasets (MS1M and Asian)?
MS1M-refine-v2 中各文件夹对应的人名或者mid有吗?比如文件夹0对应m.09zyss之类的对应关系。
the dataset doesn't contain face coordinates(left, top, right, bottom)?
Is there any overlap between MS1M and VGGface2 ?
Has someone successfully trained Mobilefacenet from scratch with DeepGlint dataset? What is the training hyperparameters? Thank you.
Hi all
I would like to try to train mobilefacenet from scratch on DeepGlint dataset. Here is my log example:
INFO:root:Epoch[5] Batch [20] Speed: 590.55 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [40] Speed: 565.22 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [60] Speed: 513.27 samples/sec acc=0.000000
INFO:root:Saved checkpoint to "./models/model_y1_softmax3_glint/model-0044.params"
INFO:root:Epoch[5] Batch [80] Speed: 82.49 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [100] Speed: 504.97 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [120] Speed: 522.76 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [140] Speed: 558.57 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [160] Speed: 503.59 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [180] Speed: 545.58 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [200] Speed: 563.97 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [220] Speed: 537.71 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [240] Speed: 561.69 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [260] Speed: 551.65 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [280] Speed: 541.85 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [300] Speed: 513.12 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [320] Speed: 535.86 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [340] Speed: 542.13 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [360] Speed: 525.81 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [380] Speed: 536.12 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [400] Speed: 517.77 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [420] Speed: 512.70 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [440] Speed: 554.69 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [460] Speed: 541.19 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [480] Speed: 499.54 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [500] Speed: 565.82 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [520] Speed: 490.50 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [540] Speed: 517.75 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [560] Speed: 512.61 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [580] Speed: 532.84 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [600] Speed: 547.83 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [620] Speed: 541.03 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [640] Speed: 523.97 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [660] Speed: 566.80 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [680] Speed: 562.19 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [700] Speed: 516.88 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [720] Speed: 544.09 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [740] Speed: 555.72 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [760] Speed: 534.52 samples/sec acc=0.000000
INFO:root:Saved checkpoint to "./models/model_y1_softmax3_glint/model-0049.params"
INFO:root:Epoch[5] Batch [10080] Speed: 84.99 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10100] Speed: 522.19 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10120] Speed: 509.01 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10140] Speed: 540.22 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10160] Speed: 520.44 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10180] Speed: 529.27 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10200] Speed: 540.42 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10220] Speed: 559.25 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10240] Speed: 538.98 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10260] Speed: 507.30 samples/sec acc=0.065755
INFO:root:Epoch[5] Batch [10280] Speed: 548.35 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10300] Speed: 531.99 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10320] Speed: 565.28 samples/sec acc=0.001563
INFO:root:Epoch[5] Batch [10340] Speed: 522.87 samples/sec acc=0.000651
INFO:root:Epoch[5] Batch [10360] Speed: 561.39 samples/sec acc=0.079557
INFO:root:Epoch[5] Batch [10380] Speed: 558.66 samples/sec acc=0.000911
INFO:root:Epoch[5] Batch [10400] Speed: 567.39 samples/sec acc=0.053125
INFO:root:Epoch[5] Batch [10420] Speed: 525.81 samples/sec acc=0.007552
INFO:root:Epoch[5] Batch [10440] Speed: 556.13 samples/sec acc=0.039453
INFO:root:Epoch[5] Batch [10460] Speed: 539.47 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10480] Speed: 530.92 samples/sec acc=0.047786
INFO:root:Epoch[5] Batch [10500] Speed: 543.45 samples/sec acc=0.000130
INFO:root:Epoch[5] Batch [10520] Speed: 551.35 samples/sec acc=0.001172
INFO:root:Epoch[5] Batch [10540] Speed: 545.21 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10560] Speed: 570.32 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10580] Speed: 552.34 samples/sec acc=0.012109
INFO:root:Epoch[5] Batch [10600] Speed: 551.80 samples/sec acc=0.004297
INFO:root:Epoch[5] Batch [10620] Speed: 528.08 samples/sec acc=0.000130
INFO:root:Epoch[5] Batch [10640] Speed: 544.59 samples/sec acc=0.150521
INFO:root:Epoch[5] Batch [10660] Speed: 527.51 samples/sec acc=0.029948
INFO:root:Epoch[5] Batch [10680] Speed: 543.34 samples/sec acc=0.038932
INFO:root:Epoch[5] Batch [10700] Speed: 527.42 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10720] Speed: 561.84 samples/sec acc=0.050651
INFO:root:Epoch[5] Batch [10740] Speed: 543.47 samples/sec acc=0.007422
INFO:root:Epoch[5] Batch [10760] Speed: 559.44 samples/sec acc=0.000000
INFO:root:Epoch[5] Batch [10780] Speed: 536.10 samples/sec acc=0.100391
INFO:root:Epoch[5] Batch [10800] Speed: 570.65 samples/sec acc=0.000130
INFO:root:Epoch[5] Batch [10820] Speed: 561.06 samples/sec acc=0.073828
INFO:root:Epoch[5] Batch [10840] Speed: 567.89 samples/sec acc=0.053646
INFO:root:Epoch[5] Batch [10860] Speed: 565.93 samples/sec acc=0.110937
INFO:root:Epoch[5] Batch [10880] Speed: 529.71 samples/sec acc=0.012500
INFO:root:Epoch[5] Batch [10900] Speed: 499.38 samples/sec acc=0.001823
INFO:root:Epoch[5] Batch [10920] Speed: 517.37 samples/sec acc=0.108464
INFO:root:Epoch[5] Batch [10940] Speed: 563.39 samples/sec acc=0.056901
INFO:root:Epoch[5] Batch [10960] Speed: 546.41 samples/sec acc=0.103385
INFO:root:Epoch[5] Batch [10980] Speed: 558.78 samples/sec acc=0.110286
Before Batch 10280, the acc is always 0, but from 10280 batches it has values. It is strange. Does anyone meet this problem before?
Thank you.
@karlTUM Training from scrath~~~ Obviously this means your model finally managed to figure out and learn something. Don'y worry, be happy.
@goodpp Hi, would you please sharing your BT torrent or download dataset for me , I find my download file can not parse and unzip successfully, I would appreciate for your help
Thanks!
sophia
@all
为何我下载的亚洲人脸数据集只能解压出4.1G 这个90+G的.tar.gz文件该怎么处理
我在解压的过程中出现了以下错误:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
能否指导一下 多谢了,现在不知道官方提供的数据是否正确,还是我自己在下载的时候文件出现了损坏?
@karlTUM I also try to train mobilefacenet from scratch on DeepGlint dataset, but the acc is only about 0.2. Can you help me?
Hi,
Where can I get the ELFW dataset (only the ELFW)?
The downloaded test dataset has already mess up the ELFW and other Flicker images together. I want the pure ELFW dataset the Deepglint mentioned.
1.ELFW: Face images of celebrities in LFW name list. There are 274k images from 5.7k ids.
Are margin-s(64) and margin-m(0.5) suitable for glint dataset (18k ids) ? @nttstar
Why do I get such low results(Identification is only 0.01270) on TrillionTairs of Glint? Maybe I did not generate the correct result. I use the code
src/eval/gen_glint.pyto get the bin file for submits. But maybe the code can not to ues directly, I modify it as follow:
The original code ingen_glint.py:image_path, label, bbox, landmark, aligned = face_preprocess.parse_lst_line(line) buffer.append( (image_path, landmark) )The original code in
src/common/face_preprocess.py:def parse_lst_line(line): vec = line.strip().split("\t") assert len(vec)>=3 aligned = int(vec[0]) image_path = vec[1] label = int(vec[2]) bbox = None landmark = None #print(vec) if len(vec)>3: bbox = np.zeros( (4,), dtype=np.int32) for i in xrange(3,7): bbox[i-3] = int(vec[i]) landmark = None if len(vec)>7: _l = [] for i in xrange(7,17): _l.append(float(vec[i])) landmark = np.array(_l).reshape( (2,5) ).T #print(aligned) return image_path, label, bbox, landmark, alignedI modify the
gen_glint.pyto:image_path, landmark = face_preprocess.parse_lst_line(line) image_path = "/to/my/path/TrillionPairs/testdata/"+line.split(" ")[0] buffer.append( (image_path, landmark) )and modify the
src/common/face_preprocess.pyto:def parse_lst_line(line): vec = line.strip().split(" ") assert len(vec)>=2 image_path = vec[0] landmark = None #print(vec) if len(vec)>2: _l = [] for i in xrange(1,11): _l.append(float(vec[i])) landmark = np.array(_l).reshape( (2,5) ).T #print(aligned) return image_path, landmarkMy input is:
--input='/to/my/path/TrillionPairs/testdata/testdata_lmk/testdata_lmk.txt'Because the input
testdata_lmk.txtformat is:testdata/00/00/00000d7e95948372025bdaca5a203832.jpg 153.4 180.0 246.6 180.0 196.8 215.8 158.5 278.7 230.6 277.6 testdata/00/00/00000f9f87210c8eb9f5fb488b1171d7.jpg 156.1 180.0 243.9 180.0 207.4 229.2 159.8 262.9 237.4 263.0 testdata/00/00/000010e4c136b77a07eeeea84d84d804.jpg 156.4 180.0 243.6 180.0 201.6 223.0 168.0 264.7 237.7 268.0So I think that my modify is right, and I got the result size of bin file about 1.8G.
I don't know what's wrong with it, if someone can find my problem or provide available code directly?
Any help will be grateful! @nttstar
you should transfer testdata_lmk.txt as @goodpp said.(becase the author change the format of the landmark)
if you dont do that , the align image is wrong, you could save it and check.
@nttstar
@Edwardmark 我使用整个数据集训练时,finetune r50,training acc为50%, glint测试结果也只有 16%,好多人都遇到这样的问题,很奇怪,目前我在分别使用ms1m,celebrity训练测试下。
@Edwardmark @yhw-yhw 对于glint官网测试结果很低的问题你们解决了吗?我这megaface测试0.984092,但glint官网测试只有0.43088,感觉不太正常,
@nttstar I just noticed that IBM had released a very impressive facial image dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/#highlights
Will you try it?
Or anyone else want to give it a try?
Hello @nttstar, thanks for the great job.
I want to merge emore with glint asia. Should we follow this same procedure (i.e. blindy merge the two datasets by not setting the model during the dataset_merge invokation).
Thanks.
Hello @nttstar, thanks for the great job.
I want to merge emore with glint asia. Should we follow this same procedure (i.e. blindy merge the two datasets by not setting the model during the dataset_merge invokation).
Thanks.
Hi @mlourencoeb,
Have you managed to merge these two datasets?
We are running:
python dataset_merge.py --include /home/ti/Downloads/DATASETS/faces_emore,/home/ti/Downloads/DATASETS/faces_glint --output /home/ti/Downloads/DATASETS/merge --model /home/ti/Downloads/insightface/models/model-r100-ii/model,0
But at the end of merging process we get the same property, .idx and .rec files as faces_emore (the same size and content). What could be the problem?
Hello @Talgin.
I did a script myself for the merging since I would like to manually review some case. There is huge overlap between glint asia and emore.
I also find lots of repeated identities in emore. I am cleaning those as we speak.
Hello @mlourencoeb,
Thank you for fast reply. I'm confused with datasets... in their paper (@nttstar) they say: "DeepGlint-Face(including MS1M-DeepGlint and Asian-DeepGlint)". So, my questions:
Thank you!
Hello @Talgin
emore is based on MSCELEB just like non asian component of faces_glint. I would merge emore with asia part only, but I could be wrong.
@mlourencoeb,
Thank you!
I'm not sure but maybe faces_glint is combination of emore and asian dataset? :) But I'll try to merge them :)
@zhouwei5113 have you solved your problem? I got also really low score on trillionpairs.
@nttstar
作者你好,我想改动一个新的结构,是在SE的地方改动的,有点困惑,mxnet 的symbol,不能直接得到bchw的值,
pytorch 的SGE,一个实现架构语句, 对应你提供的模型SE代码位置修改的话,symbol每一层bn3 后边的bchw,我直接得不到,我要mxnet,实现这句话,b, c, h, w = x.size(), x = x.reshape(b * self.groups, -1, h, w) 我对mxnet 不是那么熟悉,不知道作者你有没有好的方式实现这句reshape
我在frestnet.py修改的地方
bn3 = mx.sym.BatchNorm(data=conv2, fix_gamma=False, eps=2e-5, momentum=bn_mom, name=name + '_bn3')
#if use_se:
if usr_sge:
得到 bn3的 bchw
然后reshape
下面是对应pytorch 实现
class SpatialGroupEnhance(nn.Module): # 3 2 1 hw is half, 311 is same size
def __init__(self, groups = 64):
super(SpatialGroupEnhance, self).__init__()
self.groups = groups
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.weight = Parameter(torch.zeros(1, groups, 1, 1))
self.bias = Parameter(torch.ones(1, groups, 1, 1))
self.sig = nn.Sigmoid()
def forward(self, x): # (b, c, h, w)
b, c, h, w = x.size()
x = x.view(b * self.groups, -1, h, w) ##reshape
xn = x * self.avg_pool(x) # x * global pooling(h,w change 1)
xn = xn.sum(dim=1, keepdim=True) #(b,1,h,w)
t = xn.view(b * self.groups, -1)
t = t - t.mean(dim=1, keepdim=True)
std = t.std(dim=1, keepdim=True) + 1e-5
t = t / std # normalize -mean/std
t = t.view(b, self.groups, h, w)
t = t * self.weight + self.bias
t = t.view(b * self.groups, 1, h, w)
x = x * self.sig(t) #in order to sigmod facter,this is group factor (0-1)
x = x.view(b, c, h, w) #get to varying degrees of importance,Restoration dimension
return x
@nttstar
本身的resnet 50 IR 结构添加SGE模块,预训练模型下载的作者的resnet50 ,glint数据
,训练测试结果是这样,变化不大,
testing verification..
(12000, 512)
infer time 7.123213
[lfw][8000]XNorm: 22.401950
[lfw][8000]Accuracy-Flip: 0.99800+-0.00287
testing verification..
(14000, 512)
infer time 8.335358
[cfp_fp][8000]XNorm: 21.203882
[cfp_fp][8000]Accuracy-Flip: 0.95300+-0.01448
testing verification..
(12000, 512)
infer time 7.040614
[agedb_30][8000]XNorm: 23.488769
[agedb_30][8000]Accuracy-Flip: 0.98000+-0.00749
@mlourencoeb,
Thank you!
I'm not sure but maybe faces_glint is combination of emore and asian dataset? :) But I'll try to merge them :)
any conclusion about thedataset ? Is face_glint = emore + asian_celeb?
Ihave same issue in #789
Hi @nttstar ,
We are training on faces_glint + our_custom_dataset... now it's almost 10 days, and the thing I want to answer is why our accuracy is not changing, it is acc=~0.30-0.31. At the beginning loss value started from ~46.6-9 and after 2 days decreased to ~7.2-7.5, and acc was 0.0000 and began to rise, but after 20th epoch it stopped and the results you can see from the picture below. It is now 45th epoch, but nothing changed.
Our parameters are:
Loss: arcface
default.end_epoch = 1000
default.lr = 0.001
default.wd = 0.0005
default.mom = 0.9
default.per_batch_size: 64
default.ckpt = 3
network = r100
We are using 4 Tesla P100 GPU's.
You can see the progress from below screenshot:

@nttstar could you tell us what is the problem? We have merged the datasets according to your instructions with dataset_merge.py and no error happened :)
Hi @SueeH ,
Sorry for late reply I think this info is noted in their paper:

They say that face_glint (DeepGlint-Face) includes MS1M-DeepGlint and Asian-DeepGlint. As far as I know and reading this (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8698884) MS1M-DeepGlint is refined version of MS1M (provided by DeepGlint Corp.) and on http://trillionpairs.deepglint.com/overview they say:
- MS-Celeb-1M-v1c with 86,876 ids/3,923,399 aligned images cleaned from MS-Celeb-1M dataset. This dataset has been excluded from both LFW and Asian-Celeb.
- Asian-Celeb 93,979 ids/2,830,146 aligned images. This dataset has been excluded from both LFW and MS-Celeb-1M-v1c.
So, I think that emore (MS1MV2) is another refined version of what is included into faces_glint dataset from MS1M (because MS1M-DeepGlint has 2K more ids than MS1MV2, but less images (3.9M to 5.8M)).
- Download dataset from http://trillionpairs.deepglint.com/data (after signup).
msrais a cleaned subset of MS1M from glint whilecelebrityis the asian dataset.- Generate lst file by calling
src/data/glint2lst.py. For example:python glint2lst.py /data/glint_data msra,celebrity > glint.lstor generate the asian dataset only by:
python glint2lst.py /data/glint_data celebrity > glint_cn.lst
- Call face2rec2.py to generate .rec file.
- Merge the dataset with existing one by calling
src/data/dataset_merge.pywithout setting param _model_ which will combine all IDs from those two datasets.Finally you will get a dataset contains about 180K IDs.
Use
src/eval/gen_glint.pyto prepare test feature file by using pretrained insightface model.You can also post your private testing results here.
兄弟,我也上海的,MobileFaceNet+arcloss训练webface数据集或face-ms1m总是会Nan,不知道你试过没有,即便lr调成0.0001,20几轮后(epoch 等于24的时候)就Nan了。
Anyone can share configure training Asian Faces ? thanks
we use casia
在 2019年11月23日,13:28,pake2070 notifications@github.com 写道:
Anyone can share configure training Asian Faces ? thanks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdeepinsight%2Finsightface%2Fissues%2F256%3Femail_source%3Dnotifications%26email_token%3DAN3H756D2VG6SLCL5T2DGNLQVC5ONA5CNFSM4FFA7FK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7OBEQ%23issuecomment-557768850&data=02%7C01%7C%7Cffa6c7b4012c43ba087c08d76fd5ebce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637100836877163204&sdata=Bqe8kT%2BnNyhJ9%2BDTYByIMuG7VfQVaqTeU6xrIlz6vEk%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAN3H754SY75TUPNCXT3XPL3QVC5ONANCNFSM4FFA7FKQ&data=02%7C01%7C%7Cffa6c7b4012c43ba087c08d76fd5ebce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637100836877173209&sdata=UsqN8sgjIZ0yy2oRskQPHdkANtX9NB2Iy3FEiSd8bnM%3D&reserved=0.
I did step by step but get error about key image :
my configure : CUDA_VISIBLE_DEVICES='0,1' python3 -u src/train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 32 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002
but get key error for asian dataset:

@Edwardmark I meet the same problem with you. Did you get good results on deepglint at last?
@maywander no, I didn't. At last , I use the emore data instead.
so the models trained from emore perform better on trillionpairs test platform?@Edwardmark
@maywander yes, and I don't know why.
能正常生产glint.lst文件,但是调用face2rec.py总出错,请问有人知道怎么设置参数么?谢谢
感觉代码有问题
No such file or directory: '..../insightface/src/data/property'
@nttstar I use glint dataset to train the model but only get 77% acc in the glint test, could you share your train log which can get 86% acc.
How many iterations does it take to train this combined dataset from scratch using the any provided models until it converges?
Thanks for valuate discussion, anyone has improvement in Megaface and IJBC when working in the merged dataset? Thanks
@nttstar Thanks for the great work.
Could you please share train.lst for ms1mv2?
@mlourencoeb
Could you please share the intersection list between emore and asian glint?
Thanks in advance.
Most helpful comment
有格林的人在吗,我是楼下比特大陆的,下载速度太慢了,可以直接去楼上直接拷贝吗?