Thanks for Sharing

aaaaaaaak on 15 Jun 2018

@nttstar will you train new models with these data?

cysin on 17 Jun 2018

业界良心

406747925 on 19 Jun 2018

👍6

有格林的人在吗，我是楼下比特大陆的，下载速度太慢了，可以直接去楼上直接拷贝吗？

meanmee on 19 Jun 2018

😄75 👍42 😕4 👎2 🚀1

is there exist any same person in msra & celebrity datasets?

lmmcc on 19 Jun 2018

前几天刚听了格林的讲座，公开了这个数据集，数据集刚下载好，几百个GB，没想到这里这么快就出现了，感谢

aa12356jm on 19 Jun 2018

After test, this dataset is pretty clean, but still containing 0.3%~0.8% noise.
Also, we found their ms1m and Asian parts still have about 15-30 overlaps, though I guess it doesn't matter when the scale is already so large.
Another findings is that this dataset suffers long tail a lot. Take the asian part for example, only 18K identites out of 10K have over 25 images per class, and only few thousand identities have over 60 images.

JianbangZ on 19 Jun 2018

👍29

@aa12356jm could you share it on BaiduYun？

meanmee on 20 Jun 2018

@nttstar I download the dataset from glint, it looks like the face is similarity transformed and resized to 400x400, so for arcface, how to crop/resize this to 112x112?

zhenglaizhang on 20 Jun 2018

@zhenglaizhang I already provided the scripts.

nttstar on 20 Jun 2018

@JianbangZ do you have some idea to solve these problems?

HaoLiuHust on 21 Jun 2018

awesome !

starimpact on 21 Jun 2018

👍4 😕1

Thanks DeepGlint!

devymex on 21 Jun 2018

👍6

有意义

xxllp on 22 Jun 2018

@nttstar the download address is crashed.

vzvzx on 23 Jun 2018

@nttstar @JianbangZ how can you download glint asian face dataset? I can not find how to register and signup.

libohit on 24 Jun 2018

@nttstar @JianbangZ 为何我下载的亚洲人脸数据集只能解压出1.7G 2000+id的人脸这个90+G的.tar.gz文件该怎么处理呢能否指导一下多谢

aaaaaaaak on 25 Jun 2018

there is no lmk files in the dataset:
"lmk_file = os.path.join(input_dir, "%s_lmk.txt"%(ds))"
is it correct?

anguoyang on 25 Jun 2018

The same problem with @libohit , I can't sign in http://trillionpairs.deepglint.com/data, the button of "sign in" is dark!

Wisgon on 25 Jun 2018

@Wisgon maybe you need to use another browser

anguoyang on 26 Jun 2018

Can anyone share a copy of lmk files? Their official site seems to be maintaining. I cound download nothing.

wangchust on 26 Jun 2018

现在下载不了，是什么情况@—@

goodpp on 26 Jun 2018

I can't download the dataset, when I click the Download button, there is some error appear:
This XML file does not appear to have any style information associated with it. The document tree is shown below. <Error><Code>InvalidAccessKeyId</Code><Message>The OSS Access Key Id you provided is disabled.</Message><RequestId>5B31AE6FF68A5D785875635D</RequestId><HostId>dgplaygroundopen.oss-cn-qingdao.aliyuncs.com</HostId><OSSAccessKeyId>LTAIKdTReMdV71Zi</OSSAccessKeyId></Error>

Wisgon on 26 Jun 2018

I can't sign in http://trillionpairs.deepglint.com/data, the button of "sign in" is dark!

shineway14 on 26 Jun 2018

@shineway14 You can use http://trillionpairs.deepglint.com/login to sign in, when you finsh fill in the blanks, press enter instead of the 'log in' button.
BTW, http://trillionpairs.deepglint.com/register to register

Wisgon on 26 Jun 2018

👍5

@nttstar what is the exactly script to merge msra and celeb?

meanmee on 28 Jun 2018

👍1

goodpp on 28 Jun 2018

👍4

HI all,
I have already done the step1, which is to get glint_cn file; however, I got a error while trying to do step 2. The error code is following and please please help me to fix this issue. Thanks.

OpenCV Error: Assertion failed (src.cols > 0 && src.rows > 0) in warpAffine, file /build/buildd/opencv-2.4.8+dfsg1/modules/imgproc/src/imgwarp.cpp, line 3445
Traceback (most recent call last):
  File "face2rec2.py", line 256, in <module>
    image_encode(args, i, item, q_out)
  File "face2rec2.py", line 99, in image_encode
    img = face_preprocess.preprocess(img, bbox = item.bbox, landmark=item.landmark, image_size='%d,%d'%(args.image_h, args.image_w))
  File "../common/face_preprocess.py", line 107, in preprocess
    warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0)
cv2.error: /build/buildd/opencv-2.4.8+dfsg1/modules/imgproc/src/imgwarp.cpp:3445: error: (-215) src.cols > 0 && src.rows > 0 in function warpAffine

jackytu256 on 2 Jul 2018

@jackytu256 请问你下载的包里面有lmk文件么？
lmk_file = os.path.join(input_dir, "%s_lmk"%(ds))

anguoyang on 2 Jul 2018

@anguoyang
Yes, I got a file called celebrity_lmk

jackytu256 on 2 Jul 2018

您好，谢谢您将程序开源，但是现在我遇到这样一个问题：因为我要吧msra和celebrity数据合并在一起，但是我发现于微软的celebrity数据库好像没有property文件？期待您的解答，谢谢

YunYang1994 on 2 Jul 2018

property自己写一下, 格式: <数据集总人数>,112,112

nttstar on 2 Jul 2018

👍8

各位有谁上传结果成功过的吗？上传了好几次，都没有结果

wangchust on 2 Jul 2018

@wangchust 换一个浏览器?我用的chrome可以。

TopcoderX on 2 Jul 2018

@TopcoderX 请问上传就只需要上传自己的bin文件吗，还有结果是在哪里看啊：）

wangchust on 2 Jul 2018

@wangchust 对上传bin文件即可，结果 results页面就可以看到。

TopcoderX on 2 Jul 2018

@TopcoderX 谢谢！

wangchust on 3 Jul 2018

How can I improve my training accuracy?
有人分享训练的情况吗？下面是我的

dataset: msra + celebrity
network backbone: r34 ( output=E, emb_size=512, prelu )
loss function: arcface(m=0.5)
training pipeline: batch_size=384, pre_batch_size=96(4GPUx12G), verbose=2000
Highest LFW: 99.767%; Highest CFP_FP: 93.829%; Highest AgeDB30: 97.567% (epoch=14) megaface: 96.1564%

测试的准确率不错，但是训练的准确率现在上不去了，接近60%，如下：
INFO:root:Epoch[14] Batch [16000] Speed: 499.24 samples/sec acc=0.583464
INFO:root:Epoch[18] Batch [8380] Speed: 499.95 samples/sec acc=0.596615

INFO:root:Epoch[11] Train-acc=0.500434
INFO:root:Epoch[12] Train-acc=0.570312
INFO:root:Epoch[13] Train-acc=0.575087
INFO:root:Epoch[14] Train-acc=0.578993
INFO:root:Epoch[15] Train-acc=0.584201
INFO:root:Epoch[16] Train-acc=0.582465
INFO:root:Epoch[17] Train-acc=0.595920

1初始lr=0.1，没有设置lr_step, 但看到日志里有lr_steps [133333, 186666, 213333]，目前学习率已经降到了0.0001，按我的理解继续训练下去的，训练准确率也不会提升了，不知道这样理解对不？
2 目前情况下，我该如何提升训练准确率了？更换更小的学习率继续训练？需要更换为TripletLoss吗？
3 有必要继续再提升训练准确率吗？

goodpp on 5 Jul 2018

👍8

For those people who cannot login the web site, you can change to chrome for login.

mengzhibin on 6 Jul 2018

face_emore 和 celebrity 能否放在一起训练？两个数据集有交集吗？（同一人ID不同）

zhaowwenzhong on 6 Jul 2018

@zhaowwenzhong 可以直接合并即可.

nttstar on 6 Jul 2018

在用tripletloss 做finetune 时，我发现输出日志中“call reset”，这是否正常？？
call reset()
eval 4200 images.. 12600
triplet time stat [0.00022899999999999998, 27.907889, 5.54027, 0.0, 0.0, 0.0]
found triplets 1873
seq len 5550
INFO:root:Epoch[0] Batch [30] Speed: 124.86 samples/sec lossvalue=0.185821
INFO:root:Epoch[0] Batch [32] Speed: 931.34 samples/sec lossvalue=0.058671
INFO:root:Epoch[0] Batch [34] Speed: 648.38 samples/sec lossvalue=0.066276
INFO:root:Epoch[0] Batch [36] Speed: 638.66 samples/sec lossvalue=0.058656
INFO:root:Epoch[0] Batch [38] Speed: 636.64 samples/sec lossvalue=0.067309
call reset()
eval 4200 images.. 16800
triplet time stat [0.00036899999999999997, 34.373402, 7.256107, 0.0, 0.0, 0.0]
found triplets 1731
seq len 5100
INFO:root:Epoch[0] Batch [40] Speed: 124.52 samples/sec lossvalue=0.188634
INFO:root:Epoch[0] Batch [42] Speed: 673.45 samples/sec lossvalue=0.078769
INFO:root:Epoch[0] Batch [44] Speed: 645.02 samples/sec lossvalue=0.061025
INFO:root:Epoch[0] Batch [46] Speed: 628.74 samples/sec lossvalue=0.063693
call reset()
eval 4200 images.. 21000
triplet time stat [0.000468, 41.096862, 9.061286, 0.0, 0.0, 0.0]
found triplets 1864
seq len 5550
INFO:root:Epoch[0] Batch [48] Speed: 118.35 samples/sec lossvalue=0.202036
INFO:root:Epoch[0] Batch [50] Speed: 681.18 samples/sec lossvalue=0.073124
INFO:root:Epoch[0] Batch [52] Speed: 642.58 samples/sec lossvalue=0.074811
INFO:root:Epoch[0] Batch [54] Speed: 640.49 samples/sec lossvalue=0.067620
call reset()

zhaowwenzhong on 6 Jul 2018

data iter重置的输出忽略即可

nttstar on 7 Jul 2018

It's too slow downloading, next time, please dont use jar file.

mengzhibin on 9 Jul 2018

@goodpp can you share the model?

anguoyang on 11 Jul 2018

请问合并msra和celebrity的数据集能得到多少identity呢，我合并只得到了不到10万人.
celebrity 93979 个id
msra 85164个id
python /insightface/src/data/dataset_merge.py --include ~/data/celebrity/ , ~/data/msrc/ --output ~/data/combined/

我看代码里是有一个去重过程的，所以我想问一下，根据您设置的阈值，我得到这个合并后的数据集（id已去重)的大小应该是对的吧？

期待您的解答，万分感谢！

YunYang1994 on 11 Jul 2018

作者不是说“ 可以直接合并即可.” 嘛！！！
直接合并我的理解事把两个集的数据直接以ID区分就可以了。

zhaowwenzhong on 11 Jul 2018

@nttstar
@zhaowwenzhong 直接合并不是用/insightface/src/data/dataset_merge.py这个脚本来合并吗？我合并的时候celebrity和msra发现：在celebrity的基础上只增加了不到1000个id，合并时采用的阈值是默认的。

YunYang1994 on 11 Jul 2018

model参数留空直接合并

nttstar on 11 Jul 2018

" 直接合并" 是不是可以这样做
数据集：celebrity的ID是86876->180854(查看celebrity_lmk)
数据集：msra的ID是0->86875
两个数据集放一起，以ID区分每个人。（我目前是这样做的，没有用到dataset_merge.py,不知道这样做对不对？？？，我目前还在用这些数据训练过程中，还不知结果如何）
@nttstar
@YunYang1994

zhaowwenzhong on 11 Jul 2018

@zhaowwenzhong
用dataset_merge.py脚本合并，令model=‘’

YunYang1994 on 11 Jul 2018

I tried to fine tune with triplet:

CUDA_VISIBLE_DEVICES='0,1,2' python -u train.py --network r50 --loss-type 12 --lr 0.005 --mom 0.0 --per-batch-size 96 --data-dir /data/glint_train/ --pretrained /data1/models/model-r50,1 --prefix /data2/models/model-m1-triplet

but got following error:

gpu num: 3
num_layers 50
image_size [112, 112]
num_classes 180855
Called with argument: Namespace(batch_size=288, beta=1000.0, beta_freeze=0, beta_min=5.0, c2c_mode=-10, c2c_threshold=0.0, center_alpha=0.5, center_scale=0.003, ckpt=1, coco_scale=9.052722677456407, ctx_num=3, cutoff=0, data_dir='/data/glint_train/', easy_margin=0, emb_size=512, end_epoch=100000, gamma=0.12, image_channel=3, image_h=112, image_w=112, images_per_identity=5, incay=0.0, logits_verbose=0, loss_type=12, lr=0.005, lr_steps='', margin=4, margin_a=0.0, margin_b=0.0, margin_m=0.5, margin_s=64.0, margin_verbose=0, max_steps=0, mom=0.0, network='r50', noise_sgd=0.0, num_classes=180855, num_layers=50, output_c2c=0, patch='0_0_96_112_0', per_batch_size=96, per_identities=19, power=1.0, prefix='/data2/models/model-m1-triplet', pretrained='/data1/models/model-r50,1', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', train_limit=0, triplet_alpha=0.3, triplet_bag_size=3600, triplet_max_ap=0.0, use_deformable=0, use_val=False, verbose=2000, version_act='prelu', version_input=1, version_output='E', version_se=0, version_unit=3, wd=0.0005)
loading ['/data1/models/model-r50', '1']
[19:17:40] src/engine/engine.cc:55: MXNet start using engine: ThreadedEnginePerDevice
init resnet 50
0 1 E 3 prelu
INFO:root:loading recordio /data/glint_train/train.rec...
header0 label [6753546. 6934401.]
id2range 180855
0 0 6753545
c2c_stat [0, 180855]
6753545
rand_mirror 1
5 19 3
(288,)
oseq 822654
lr_steps [71111, 106666, 142222]
/usr/lib/python2.7/site-packages/mxnet/module/base_module.py:490: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.333333333333 vs. 0.00347222222222). Is this intended?
  optimizer_params=optimizer_params)
call reset()
eval 3600 images.. 0
triplet time stat [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Traceback (most recent call last):
  File "train.py", line 1062, in <module>
    main()
  File "train.py", line 1059, in main
    train_net(args)
  File "train.py", line 1053, in train_net
    epoch_end_callback = epoch_cb )
  File "/usr/lib/python2.7/site-packages/mxnet/module/base_module.py", line 506, in fit
    next_data_batch = next(data_iter)
  File "/root/work/insightface/src/data.py", line 1010, in next
    ret = self.cur_iter.next()
  File "/root/work/insightface/src/data.py", line 860, in next
    self.reset()
  File "/root/work/insightface/src/data.py", line 726, in reset
    self.triplet_reset()
  File "/root/work/insightface/src/data.py", line 575, in triplet_reset
    self.select_triplets()
  File "/root/work/insightface/src/data.py", line 528, in select_triplets
    label[i-ba][:] = header.label
  File "/usr/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 444, in __setitem__
    self._set_nd_basic_indexing(key, value)
  File "/usr/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 706, in _set_nd_basic_indexing
    value = np.broadcast_to(value, shape)
  File "/usr/lib64/python2.7/site-packages/numpy/lib/stride_tricks.py", line 173, in broadcast_to
    return _broadcast_to(array, shape, subok=subok, readonly=True)
  File "/usr/lib64/python2.7/site-packages/numpy/lib/stride_tricks.py", line 128, in _broadcast_to
    op_flags=[op_flag], itershape=shape, order='C').itviews[0]
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (1,)

Any idea about this? Thanks

cysin on 11 Jul 2018

@cysin rec 生成工具要改一下。(通过lst生成rec #265)

zhaowwenzhong on 12 Jul 2018

| Epoch[0] | Epoch[1] | Epoch[2] | Epoch[3] | Epoch[4] | Epoch[5] | Epoch[6] | Epoch[7]
-- | -- | -- | -- | -- | -- | -- | -- | --
agedb_30 | 65.62+3.64 | 86.88+2.26 | 76.05+1.56 | 50.00+0.00 | 50.00+0.00 | 79.88+2.00 | |
cfp_ff | 81.24+2.36 | 97.14+0.74 | 94.29+1.63 | 50.11+0.12 | 50.09+0.15 | 87.94+2.24 | |
cfp_fp | 65.69+1.44 | 77.03+2.68 | 71.76+0.62 | 50.07+0.26 | 50.00+0.00 | 66.54+1.92 | |
lfw | 81.63+1.86 | 97.67+1.01 | 94.42+1.09 | 50.18+0.30 | 50.00+0.00 | 90.53+1.10 | |
Train-acc | 0.028951 | 0.050758 | 0.061312 | 0.066338 | 0.073008 | 0.078673 | |

从以上测试结果看，随着训练epoch的增加，测试精度在降低，比如lfw：81.63->97.67->94.42->50.18 ->50.00->90.53，但训练精度在提高，这种现象是不是过学习了，或者是否哪里有问题？？我该尝试调整哪些参数？？
训练数据主要是msra+celebrity(每人照片数>=3张，大约18万人)

zhaowwenzhong on 12 Jul 2018

@zhaowwenzhong Did you mean the rec format used for triplet training is different from the one used for softmax training?

cysin on 12 Jul 2018

I fine-tune on asian celebrity dataset, using the command below:

!/usr/bin/env bash

export MXNET_CPU_WORKER_NTHREADS=24
export MXNET_CUDNN_AUTOTUNE_DEFAULT=0
export MXNET_ENGINE_TYPE=ThreadedEnginePerDevice

NETWORK=r50
JOB=asian
MODELDIR="../model-$NETWORK-$JOB"
mkdir -p "$MODELDIR"
PREFIX="$MODELDIR/model-asian"
LOGFILE="$MODELDIR/log"

CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py
--network "$NETWORK"
--loss-type 0
--lr 0.005
--per-batch-size 64
--data-dir ../datasets/faces_asian_112x112
--pretrained ../models/model-r50-am-lfw/model,0000
--prefix "$PREFIX"
but I get the folowing warning:
UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.5 vs. 0.0078125). Is this intended?
optimizer_params=optimizer_params)

Do you know what it means?@nttstar

Edwardmark on 16 Jul 2018

@nttstar 您好，我将这两个数据集合并在一起得到约18万个ID，然后用您训练好的模型去抽取每个ID的中心特征向量，并计算两两之间的向量距离(cos值)，并把大于0.85的ID抽取出来，结果发现有6417个ID对，即一共有6417对其余弦距离大于0.85的ID。我用人眼和百度识图大致过了下, 发现的确是同一个人。这就意味着合并得到的数据集里可能有一些ID重复了，目前我这里得到的是6417对ID重复了。

我不确定是我做法有误，还是数据集里本身不太干净？

@meanmee @406747925 @zhaowwenzhong

YunYang1994 on 16 Jul 2018

👍5

@YunYang1994 不干净是可能的你可以尝试对比下去重后训练和直接合并有什么区别. 我觉得差距应该几乎可以忽略.

nttstar on 16 Jul 2018

@nttstar 你好，请问一下我使用1080TIx4为啥速度只有30samples/sec,最高也就60samples/sec，看大家的速度起码有300到1000张/sec。请问训练速度很慢的原因可能是什么？谢谢

Edwardmark on 16 Jul 2018

@YunYang1994 每个ID的中心特征向量是怎么获取的

zchflyer on 17 Jul 2018

@nttstar 我看描述提供的是align的图片，有非align的图片集么？谢谢！

tornadomeet on 19 Jul 2018

@zchflyer 里面有源码，计算每个Id的中心向量即可

YunYang1994 on 23 Jul 2018

Why do I get such low results(Identification is only 0.01270) on TrillionTairs of Glint? Maybe I did not generate the correct result. I use the code src/eval/gen_glint.pyto get the bin file for submits. But maybe the code can not to ues directly, I modify it as follow:
The original code in gen_glint.py:

image_path, label, bbox, landmark, aligned = face_preprocess.parse_lst_line(line)
buffer.append( (image_path, landmark) )

The original code in src/common/face_preprocess.py:

def parse_lst_line(line):
  vec = line.strip().split("\t")
  assert len(vec)>=3
  aligned = int(vec[0])
  image_path = vec[1]
  label = int(vec[2])
  bbox = None
  landmark = None
  #print(vec)
  if len(vec)>3:
    bbox = np.zeros( (4,), dtype=np.int32)
    for i in xrange(3,7):
      bbox[i-3] = int(vec[i])
    landmark = None
    if len(vec)>7:
      _l = []
      for i in xrange(7,17):
        _l.append(float(vec[i]))
      landmark = np.array(_l).reshape( (2,5) ).T
  #print(aligned)
  return image_path, label, bbox, landmark, aligned

I modify the gen_glint.py to:

    image_path, landmark = face_preprocess.parse_lst_line(line)  
    image_path = "/to/my/path/TrillionPairs/testdata/"+line.split(" ")[0]
    buffer.append( (image_path, landmark) )

and modify the src/common/face_preprocess.py to:

def parse_lst_line(line):
  vec = line.strip().split(" ")
  assert len(vec)>=2
  image_path = vec[0]
  landmark = None
  #print(vec)
  if len(vec)>2:
    _l = []
    for i in xrange(1,11):
      _l.append(float(vec[i]))
    landmark = np.array(_l).reshape( (2,5) ).T
  #print(aligned)
  return image_path, landmark

My input is:

--input='/to/my/path/TrillionPairs/testdata/testdata_lmk/testdata_lmk.txt'

Because the input testdata_lmk.txt format is:

testdata/00/00/00000d7e95948372025bdaca5a203832.jpg 153.4 180.0 246.6 180.0 196.8 215.8 158.5 278.7 230.6 277.6
testdata/00/00/00000f9f87210c8eb9f5fb488b1171d7.jpg 156.1 180.0 243.9 180.0 207.4 229.2 159.8 262.9 237.4 263.0
testdata/00/00/000010e4c136b77a07eeeea84d84d804.jpg 156.4 180.0 243.6 180.0 201.6 223.0 168.0 264.7 237.7 268.0

So I think that my modify is right, and I got the result size of bin file about 1.8G.

I don't know what's wrong with it, if someone can find my problem or provide available code directly?

Any help will be grateful! @nttstar

becauseofAI on 23 Jul 2018

👍3

@nttstar When I run src/eval/gen_glint.py, I observe that the memory used is constantly increasing which is weired, is that normal? And another question， what does the following line mean? https://github.com/deepinsight/insightface/blob/master/src/eval/gen_glint.py#L131
When I run the code, I got following error:
sh: 1: bypy: not found
Please help me out, thank you very much.

Edwardmark on 24 Jul 2018

@becauseofAI I have the same problems with you, my network training accuracy gets 0.82 while using whole deepglint datasets, however, I submit my result and get 0.016 identity results. @nttstar

yhw-yhw on 24 Jul 2018

@yhw-yhw How did you get the result file of .bin? If you use the code src/eval/gen_glint.pyalso, did you modify it somewhere? And do you know what the file of Trillion Pairs/testdata/feature_tools/matio.pydownloaded with the Dataset is for?

AaronYKing on 24 Jul 2018

@nttstar I use the same modify with @becauseofAI to generate the result through using the model of LResNet100E-IR|Emore in Model-Zoo, but only gets 0.00178 identity results. Can you test with it and share you result and code with us?

AaronYKing on 24 Jul 2018

@becauseofAI @yhw-yhw @AaronYKing Can you give us a complete right way to generate the correct submit file? I'm sorry that recently I have no time to test it. Thanks~

nttstar on 24 Jul 2018

有人测试出结果吗？我把bin上传到deepglint官网，传完数据就卡死在那个页面了，没有反应，在result界面也没有结果。求助大家，该怎么操作。

Edwardmark on 24 Jul 2018

@nttstar @becauseofAI @yhw-yhw I have the same problems, I submit my result and get 0.01465 identity results.
I have not modified the code, My step is :

src/data/glint2lst.py /xxx/glint testdata > /home/xxx/glint_test.lst
src/eval/gen_glint.py --input /home/xxx/glint_test.lst --output my_result.bin {...other param}

goodpp on 24 Jul 2018

@Edwardmark 我也遇到过，注销下重新登录就好了，如果还没有可能是数据没传成功。

yhw-yhw on 24 Jul 2018

@nttstar @yhw-yhw @AaronYKing I have upload the code to generate submit file on GoogleDrive. You need to put it in the directory of insightface/src/eval/and you can use the model of LResNet100E-IR|Emore in Model-Zoo to generate the submit file. But maybe the code have something with wrong, It only gets 0.00178 identity results.
Anyone who can check the code to solve the problom will be grateful!

becauseofAI on 24 Jul 2018

@nttstar could glint provide the original non-aligned face data set?

tornadomeet on 25 Jul 2018

👍1

我测试了在celebrity数据集上微调的结果，结果见下，应该是测对了，但是效果很差。。。
verification@1e-9： res1
identification@1e-3: res2
代码就使用@nttstat给出的，不需要任何修改。

Edwardmark on 25 Jul 2018

@yhw-yhw 多谢，请问你有遇到如下的问题吗？下面这行代码是什么意思？https://github.com/deepinsight/insightface/blob/master/src/eval/gen_glint.py#L131
When I run the code, I got following error:
sh: 1: bypy: not found
Please help me out, thank you very much.

Edwardmark on 25 Jul 2018

@cysin@zhaowwenzhong 生成triplet训练数据的rec要改rec的格式吗？我也遇到了如下问题：
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (1,)

Edwardmark on 25 Jul 2018

@Edwardmark bypy 是百度云上传的脚本命名，可以不用管的；
另外我在使用triplet loss训练glint data时，也遇到这样的问题, 这是两个数据集生成rec格式不同;
解决方法：
修改 data.py 中 528，529行：
https://github.com/deepinsight/insightface/blob/master/src/data.py#L528
label[i-ba][:] = header.label
tag.append( ( int(header.label), _idx) )
为
label[i-ba][:] = header.label[0]
tag.append( ( int(header.label[0]), _idx) ) 即可；

yhw-yhw on 25 Jul 2018

@yhw-yhw @nttstar 好的，还有个问题就是直接运行：
python glint2lst.py /data/glint_data msra,celebrity > glint.lst 生成两个数据集合并起来的list，然后运行：
python src/data/face2rec2.py ${path-to-glint-data-and-glint-lst}（该路径下包含glint.lst），rec文件不是已经生成完了吗？还需要运行merge.py吗？为什么要运行merge.py呢?

Edwardmark on 25 Jul 2018

@yhw-yhw 十分感谢，还有就是我不太明白为何要运行merge.py合并数据，运行python glint2lst.py /data/glint_data msra,celebrity > glint.lst，不是就生成了list了吗？直接根据这个list生成rec不就好了吗？为啥还要合并呢？

Edwardmark on 26 Jul 2018

这是合并其他数据集用的

nttstar on 26 Jul 2018

@nttstar多谢您的耐心回复，十分感谢。

Edwardmark on 26 Jul 2018

@nttstar @yhw-yhw 请问您一下，对于这种18万类的分类，使用softmax loss以及其改进训练或者微调是不是很难达到较好的效果呢？对于该问题，是否应该直接使用triplet loss在glint数据上微调即可呢?我使用arcface的损失函数在r50模型上对glint数据微调了一天，发现训练准确率一直在0附近，测试准确率也下降很多。希望跟您讨论一下对这种类别很多的多分类任务有什么较好的方法。

Edwardmark on 26 Jul 2018

@Edwardmark 用arcface loss对作者release的r50模型再glint数据集上进行finetune，我目前训练了10w iteration，使用的lr 是 0.0001，目前acc为0.55；这个lr还需要再调，我的经验是lr的选取很重要，一般在很多id数据集上训练，先用 0.01训练10w iteration，再用0.001训练20w iteration，再用0.0001训练10w iteration基本上就能到一个非常好的结果，最后0.00001训练一段时间，acc就不会变，我的batch size是128；

yhw-yhw on 26 Jul 2018

👍1

@yhw-yhw 多谢，我之前在作者的r50上仅使用celebrity fine-tune过，acc在58%，上传glint官网仅有18%的准确率，请问您训练的模型有上传glint测试吗？结果如何？

Edwardmark on 26 Jul 2018

@Edwardmark 我使用整个数据集训练时，finetune r50，training acc为50%， glint测试结果也只有 16%，好多人都遇到这样的问题，很奇怪，目前我在分别使用ms1m，celebrity训练测试下。

yhw-yhw on 26 Jul 2018

@yhw-yhw ,您好，我使用triplet fine-tune的时候，将batch设为120，使用4卡训练，出现下面错误，请问您遇到过吗？
Traceback (most recent call last):
File "train.py", line 1062, in
main()
File "train.py", line 1059, in main
train_net(args)
File "train.py", line 1053, in train_net
epoch_end_callback = epoch_cb )
File "/usr/local/lib/python2.7/dist-packages/mxnet/module/base_module.py", line 506, in fit
next_data_batch = next(data_iter)
File "/mntML/dongbin/insightface/src/data.py", line 1011, in next
ret = self.cur_iter.next()
File "/mntML/dongbin/insightface/src/data.py", line 861, in next
self.reset()
File "/mntML/dongbin/insightface/src/data.py", line 727, in reset
self.triplet_reset()
File "/mntML/dongbin/insightface/src/data.py", line 576, in triplet_reset
self.select_triplets()
File "/mntML/dongbin/insightface/src/data.py", line 545, in select_triplets
embeddings[ba:bb,:] = net_out
ValueError: could not broadcast input array from shape (480,512) into shape (240,512)

Edwardmark on 27 Jul 2018

有人在glint训练后，提交结果比较好的吗？我不管怎么训练在测试集上效果很好，但是上传上去却结果很差，不知道是什么原因。@nttstar@becauseofAI@yhw-yhw，今天换用triplet loss 微调后，结果从18%上涨到20%，但是还是不如不进行调节的r50模型，r50模型我上传上官网获得了48%左右的准确率。

Edwardmark on 27 Jul 2018

We will try to ask Glint to check recent test results soon

nttstar on 27 Jul 2018

Does this dataset is better than the one provide by @nttstar ?

xmuszq on 27 Jul 2018

@nttstar 大侠，请问有向glint反映吗？是代码问题还是说确实是训练效果太差呢？

Edwardmark on 30 Jul 2018

有人在glint训练后，提交结果比较好的吗？我不管怎么训练在测试集上效果很好，但是上传上去却结果很差，不知道是什么原因。@becauseofAI@yhw-yhw，今天换用triplet loss 微调后，结果从18%上涨到20%，但是还是不如不进行调节的r50模型，r50模型我上传上官网获得了48%左右的准确率。

Edwardmark on 30 Jul 2018

@Edwardmark 你获得48%的准确率用的模型直接是它这里提供的LResNet50E-IR吗?用来生成bin 的代码也是gen_glint.py吗?还是做了别的什么改进?
我用LResNet50E-IR模型提的特征上传只有31%的准确率.

wenjie710 on 31 Jul 2018

@xsr-ai 你做完平衡还有15W？我看了一下ac_glint 长尾很严重，大部分图片都只有几张20张

JianbangZ on 31 Jul 2018

@xsr-ai 数据均衡这块怎么弄

HaoLiuHust on 1 Aug 2018

@nttstar 您好，请问您训练集上的accuracy大概能达到多少呢?因为我发现我训练集上的准确率很低, 但是lfw上的准确率很高。

YunYang1994 on 1 Aug 2018

@nttstar @yhw-yhw 请问一下，有人训过triplit loss吗？为啥感觉完全不收敛啊

Edwardmark on 1 Aug 2018

请问一下，有人训过triplit loss吗？为啥感觉完全不收敛啊。虽说是online-hard-negtive-mining,但是总得有个整体的趋势吧？感觉一直不降啊，有啥好办法吗？

Edwardmark on 1 Aug 2018

@Edwardmark @yhw-yhw 请问下你们解决了自己训练的模型在glint训练后测试结果不好的问题了吗？
我试了好多次自己的模型就是不行。。。, 自己glint训练的模型在其他测试包括megaface上都没有问题

|Name | TPR@FPR=1e-3 | metric |
| -------------- |:------------------:| ------: |
| 基准demo | 0.43883 | cos |
| Pretrained r34 | 0.49736 | cos |
| Pretrained r50 | 0.49473 | cos |
| 自己glint_r34 | 0.01465 | cos |
| 自己MS1M_r34 | 0.50138 | cos |

我生成测试文件的步骤都是一样的，只是模型不一样

src/data/glint2lst.py /xxx/glint testdata > /home/xxx/glint_test.lst
src/eval/gen_glint.py --input /home/xxx/glint_test.lst --output my_result.bin {...other param}
补充下：今天试了下之前自己用refined MS1M训练复现的r34模型，结果没有问题，而且效果不错,iden.=0.50138 veri.=0.53127...

goodpp on 3 Aug 2018

👍1

@goodpp I solved it by use triplet loss.我微调后测试结果为48%，比原来的有所下降，但是还算正常。

Edwardmark on 3 Aug 2018

@Edwardmark 谢谢，我也试试

goodpp on 3 Aug 2018

求助：acc只有0.24左右
我是在以前训练的模型下用celebrity进行微调的。以前训练时acc在0.53左右，但是微调训练时acc只有0.22.三个测试集LFW ageDB CF-P和以前相近，请问这该怎么办呢？
命令：
CUDA_VISIBLE_DEVICES='1,0' python -u train_softmax.py --network y1 --ckpt 2 --loss-type 4 --lr 0.001 --lr-steps 55000,85000,100000,110000 --wd 0.00004 --fc7-wd-mult 10 --emb-size 128 --per-batch-size 128 --margin-s 128 --data-dir ../datasets/faces_glint_112x112 --pretrained ../models/MobileFaceNet_glint/model-y1-arcface_V2,0042 --prefix ../models/MobileFaceNet_glint/model-y1-arcface_V2

BUAA-21Li on 3 Aug 2018

求助，LFW 精度 98.9%，有点低。
我采用 celebrity 数据集从零开始训练，network 采用 mobilenetv2 , Loss 函数采用 arcface
命令：
LRSTEPS='32000,48000,56000'
CUDA_VISIBLE_DEVICES='4,6' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 256 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 > "$LOGFILE" 2>&1 &

@nttstar
LFW 精度低是不是与数据集少而且分布不均衡以及学习率的设置有关系，想问一下这块的超参数如何设置比较合理。

lixiaohui2020 on 10 Aug 2018

用正常wd, mobilenetv2的实现可能有问题

nttstar on 10 Aug 2018

@nttstar
没有太理解你的意思，需要再麻烦你一下？
你提到的正常的 wd 和 wd = 0.00004 有什么区别？
mobilenetv2的实现可能有问题是指那一块？我之前也是采用上述参数训练 MS（你在 github 上提供的 rec文件）数据集训练精度能够达到 99.25%。

lixiaohui2020 on 10 Aug 2018

Anybody knows how to use dataloader to improve gpu utility?

Edwardmark on 10 Aug 2018

msra 数据集在哪儿下载啊？

bigbao9494 on 14 Aug 2018

license?

twmht on 22 Aug 2018

@nttstar Which script is used to resize the images in Asia celebrity to 112 by 112? It seems that face2rec2.py has processed it with face_preprocess.preprocess(img,bbox,...). So, we need not to resize these images alone?

tinggh on 26 Sep 2018

GT of glint-challenge was updated. See http://trillionpairs.deepglint.com/results

nttstar on 27 Sep 2018

do you have Face Alignment models?

cuppersd on 29 Sep 2018

👍1

@JianbangZ
Thanks for reporting these overlaps.
Would you please share with us, how do you find overlapping and noise images between MS1M and Asian datasets? Have you done it manually or automatically?

test4fest on 19 Oct 2018

@test4fest automatically + manually. What we did is calculating the embedding clustering center for each identity for each dataset. and then do a center-to-center similarity/distance calculating. Then you can set a threshold to automatically find some overlaps, and use a higher thresh and manually check some unsure ones

JianbangZ on 19 Oct 2018

👍1

@JianbangZ
Is it possible that I use a pre-trained network output to calculate the embedding?
Or I have to train a new model based on these combined datasets (MS1M and Asian)?

test4fest on 22 Oct 2018

MS1M-refine-v2 中各文件夹对应的人名或者mid有吗？比如文件夹0对应m.09zyss之类的对应关系。

huohuai on 25 Oct 2018

the dataset doesn't contain face coordinates(left, top, right, bottom)?

jetsmith on 31 Oct 2018

Is there any overlap between MS1M and VGGface2 ?

hustzeyu on 2 Nov 2018

Has someone successfully trained Mobilefacenet from scratch with DeepGlint dataset? What is the training hyperparameters? Thank you.

jiankang1991 on 7 Nov 2018

Hi all
I would like to try to train mobilefacenet from scratch on DeepGlint dataset. Here is my log example:

INFO:root:Epoch[5] Batch [20]   Speed: 590.55 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [40]   Speed: 565.22 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [60]   Speed: 513.27 samples/sec       acc=0.000000
INFO:root:Saved checkpoint to "./models/model_y1_softmax3_glint/model-0044.params"
INFO:root:Epoch[5] Batch [80]   Speed: 82.49 samples/sec        acc=0.000000
INFO:root:Epoch[5] Batch [100]  Speed: 504.97 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [120]  Speed: 522.76 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [140]  Speed: 558.57 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [160]  Speed: 503.59 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [180]  Speed: 545.58 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [200]  Speed: 563.97 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [220]  Speed: 537.71 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [240]  Speed: 561.69 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [260]  Speed: 551.65 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [280]  Speed: 541.85 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [300]  Speed: 513.12 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [320]  Speed: 535.86 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [340]  Speed: 542.13 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [360]  Speed: 525.81 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [380]  Speed: 536.12 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [400]  Speed: 517.77 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [420]  Speed: 512.70 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [440]  Speed: 554.69 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [460]  Speed: 541.19 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [480]  Speed: 499.54 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [500]  Speed: 565.82 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [520]  Speed: 490.50 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [540]  Speed: 517.75 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [560]  Speed: 512.61 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [580]  Speed: 532.84 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [600]  Speed: 547.83 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [620]  Speed: 541.03 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [640]  Speed: 523.97 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [660]  Speed: 566.80 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [680]  Speed: 562.19 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [700]  Speed: 516.88 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [720]  Speed: 544.09 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [740]  Speed: 555.72 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [760]  Speed: 534.52 samples/sec       acc=0.000000


INFO:root:Saved checkpoint to "./models/model_y1_softmax3_glint/model-0049.params"
INFO:root:Epoch[5] Batch [10080]        Speed: 84.99 samples/sec        acc=0.000000
INFO:root:Epoch[5] Batch [10100]        Speed: 522.19 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10120]        Speed: 509.01 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10140]        Speed: 540.22 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10160]        Speed: 520.44 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10180]        Speed: 529.27 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10200]        Speed: 540.42 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10220]        Speed: 559.25 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10240]        Speed: 538.98 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10260]        Speed: 507.30 samples/sec       acc=0.065755
INFO:root:Epoch[5] Batch [10280]        Speed: 548.35 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10300]        Speed: 531.99 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10320]        Speed: 565.28 samples/sec       acc=0.001563
INFO:root:Epoch[5] Batch [10340]        Speed: 522.87 samples/sec       acc=0.000651
INFO:root:Epoch[5] Batch [10360]        Speed: 561.39 samples/sec       acc=0.079557
INFO:root:Epoch[5] Batch [10380]        Speed: 558.66 samples/sec       acc=0.000911
INFO:root:Epoch[5] Batch [10400]        Speed: 567.39 samples/sec       acc=0.053125
INFO:root:Epoch[5] Batch [10420]        Speed: 525.81 samples/sec       acc=0.007552
INFO:root:Epoch[5] Batch [10440]        Speed: 556.13 samples/sec       acc=0.039453
INFO:root:Epoch[5] Batch [10460]        Speed: 539.47 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10480]        Speed: 530.92 samples/sec       acc=0.047786
INFO:root:Epoch[5] Batch [10500]        Speed: 543.45 samples/sec       acc=0.000130
INFO:root:Epoch[5] Batch [10520]        Speed: 551.35 samples/sec       acc=0.001172
INFO:root:Epoch[5] Batch [10540]        Speed: 545.21 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10560]        Speed: 570.32 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10580]        Speed: 552.34 samples/sec       acc=0.012109
INFO:root:Epoch[5] Batch [10600]        Speed: 551.80 samples/sec       acc=0.004297
INFO:root:Epoch[5] Batch [10620]        Speed: 528.08 samples/sec       acc=0.000130
INFO:root:Epoch[5] Batch [10640]        Speed: 544.59 samples/sec       acc=0.150521
INFO:root:Epoch[5] Batch [10660]        Speed: 527.51 samples/sec       acc=0.029948
INFO:root:Epoch[5] Batch [10680]        Speed: 543.34 samples/sec       acc=0.038932
INFO:root:Epoch[5] Batch [10700]        Speed: 527.42 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10720]        Speed: 561.84 samples/sec       acc=0.050651
INFO:root:Epoch[5] Batch [10740]        Speed: 543.47 samples/sec       acc=0.007422
INFO:root:Epoch[5] Batch [10760]        Speed: 559.44 samples/sec       acc=0.000000
INFO:root:Epoch[5] Batch [10780]        Speed: 536.10 samples/sec       acc=0.100391
INFO:root:Epoch[5] Batch [10800]        Speed: 570.65 samples/sec       acc=0.000130
INFO:root:Epoch[5] Batch [10820]        Speed: 561.06 samples/sec       acc=0.073828
INFO:root:Epoch[5] Batch [10840]        Speed: 567.89 samples/sec       acc=0.053646
INFO:root:Epoch[5] Batch [10860]        Speed: 565.93 samples/sec       acc=0.110937
INFO:root:Epoch[5] Batch [10880]        Speed: 529.71 samples/sec       acc=0.012500
INFO:root:Epoch[5] Batch [10900]        Speed: 499.38 samples/sec       acc=0.001823
INFO:root:Epoch[5] Batch [10920]        Speed: 517.37 samples/sec       acc=0.108464
INFO:root:Epoch[5] Batch [10940]        Speed: 563.39 samples/sec       acc=0.056901
INFO:root:Epoch[5] Batch [10960]        Speed: 546.41 samples/sec       acc=0.103385
INFO:root:Epoch[5] Batch [10980]        Speed: 558.78 samples/sec       acc=0.110286

Before Batch 10280, the acc is always 0, but from 10280 batches it has values. It is strange. Does anyone meet this problem before?
Thank you.

jiankang1991 on 16 Nov 2018

@karlTUM Training from scrath~~~ Obviously this means your model finally managed to figure out and learn something. Don'y worry, be happy.

oukohou on 19 Nov 2018

@goodpp Hi, would you please sharing your BT torrent or download dataset for me , I find my download file can not parse and unzip successfully, I would appreciate for your help

Thanks!
sophia

sophiazy on 22 Nov 2018

@all
为何我下载的亚洲人脸数据集只能解压出4.1G 这个90+G的.tar.gz文件该怎么处理
我在解压的过程中出现了以下错误：

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

能否指导一下多谢了，现在不知道官方提供的数据是否正确，还是我自己在下载的时候文件出现了损坏？

sophiazy on 23 Nov 2018

@karlTUM I also try to train mobilefacenet from scratch on DeepGlint dataset, but the acc is only about 0.2. Can you help me?

songjd on 28 Nov 2018

Hi,

Where can I get the ELFW dataset (only the ELFW)?
The downloaded test dataset has already mess up the ELFW and other Flicker images together. I want the pure ELFW dataset the Deepglint mentioned.

1.ELFW: Face images of celebrities in LFW name list. There are 274k images from 5.7k ids.

xmuszq on 29 Nov 2018

Are margin-s(64) and margin-m(0.5) suitable for glint dataset (18k ids) ? @nttstar

felixfuu on 3 Dec 2018

Why do I get such low results(Identification is only 0.01270) on TrillionTairs of Glint? Maybe I did not generate the correct result. I use the code src/eval/gen_glint.pyto get the bin file for submits. But maybe the code can not to ues directly, I modify it as follow:
The original code in gen_glint.py:
image_path, label, bbox, landmark, aligned = face_preprocess.parse_lst_line(line)
buffer.append( (image_path, landmark) )
The original code in src/common/face_preprocess.py:
def parse_lst_line(line):
  vec = line.strip().split("\t")
  assert len(vec)>=3
  aligned = int(vec[0])
  image_path = vec[1]
  label = int(vec[2])
  bbox = None
  landmark = None
  #print(vec)
  if len(vec)>3:
    bbox = np.zeros( (4,), dtype=np.int32)
    for i in xrange(3,7):
      bbox[i-3] = int(vec[i])
    landmark = None
    if len(vec)>7:
      _l = []
      for i in xrange(7,17):
        _l.append(float(vec[i]))
      landmark = np.array(_l).reshape( (2,5) ).T
  #print(aligned)
  return image_path, label, bbox, landmark, aligned
I modify the gen_glint.py to:
    image_path, landmark = face_preprocess.parse_lst_line(line)  
    image_path = "/to/my/path/TrillionPairs/testdata/"+line.split(" ")[0]
    buffer.append( (image_path, landmark) ) 
and modify the src/common/face_preprocess.py to:
def parse_lst_line(line):
  vec = line.strip().split(" ")
  assert len(vec)>=2
  image_path = vec[0]
  landmark = None
  #print(vec)
  if len(vec)>2:
    _l = []
    for i in xrange(1,11):
      _l.append(float(vec[i]))
    landmark = np.array(_l).reshape( (2,5) ).T
  #print(aligned)
  return image_path, landmark
My input is:
--input='/to/my/path/TrillionPairs/testdata/testdata_lmk/testdata_lmk.txt'
Because the input testdata_lmk.txt format is:
testdata/00/00/00000d7e95948372025bdaca5a203832.jpg 153.4 180.0 246.6 180.0 196.8 215.8 158.5 278.7 230.6 277.6
testdata/00/00/00000f9f87210c8eb9f5fb488b1171d7.jpg 156.1 180.0 243.9 180.0 207.4 229.2 159.8 262.9 237.4 263.0
testdata/00/00/000010e4c136b77a07eeeea84d84d804.jpg 156.4 180.0 243.6 180.0 201.6 223.0 168.0 264.7 237.7 268.0
So I think that my modify is right, and I got the result size of bin file about 1.8G.

I don't know what's wrong with it, if someone can find my problem or provide available code directly?

Any help will be grateful! @nttstar

you should transfer testdata_lmk.txt as @goodpp said.(becase the author change the format of the landmark)
if you dont do that , the align image is wrong, you could save it and check.

ckybit on 21 Mar 2019

@nttstar

Is the DeepGlint dataset introduced in https://github.com/deepinsight/insightface/wiki/Dataset-Zoo an already merged set from msra and celebrity mentioned in Trillion Pairs test, right?
I've got 0.984092 on megaface but only 0.43088 on trilllion pairs test (both top-1 identity metric). Training data I used is DeepGlint. When I changed the training data to emore, I can easily get 0.80+ result on trilllion pairs test. Now I am confused about the low score on trillion pairs test when using DeepGlint as training data. Anyone can help me?
Call face2rec2.py to re-generate glint.rec file based on above steps1,2,3. Then I've encountered such a problem "s = self.imgrec.read_idx(0) KeyError: 0" when training. What causes such an error?

zhouwei5113 on 8 Apr 2019

@Edwardmark 我使用整个数据集训练时，finetune r50，training acc为50%， glint测试结果也只有 16%，好多人都遇到这样的问题，很奇怪，目前我在分别使用ms1m，celebrity训练测试下。

@Edwardmark @yhw-yhw 对于glint官网测试结果很低的问题你们解决了吗？我这megaface测试0.984092，但glint官网测试只有0.43088，感觉不太正常，

zhouwei5113 on 9 Apr 2019

@nttstar I just noticed that IBM had released a very impressive facial image dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/#highlights
Will you try it?

Or anyone else want to give it a try?

meanmee on 9 Apr 2019

Hello @nttstar, thanks for the great job.

I want to merge emore with glint asia. Should we follow this same procedure (i.e. blindy merge the two datasets by not setting the model during the dataset_merge invokation).

Thanks.

mlourencoeb on 30 Apr 2019

Hello @nttstar, thanks for the great job.

I want to merge emore with glint asia. Should we follow this same procedure (i.e. blindy merge the two datasets by not setting the model during the dataset_merge invokation).

Thanks.

Hi @mlourencoeb,
Have you managed to merge these two datasets?
We are running:
python dataset_merge.py --include /home/ti/Downloads/DATASETS/faces_emore,/home/ti/Downloads/DATASETS/faces_glint --output /home/ti/Downloads/DATASETS/merge --model /home/ti/Downloads/insightface/models/model-r100-ii/model,0

But at the end of merging process we get the same property, .idx and .rec files as faces_emore (the same size and content). What could be the problem?

Talgin on 4 Jul 2019

Hello @Talgin.

I did a script myself for the merging since I would like to manually review some case. There is huge overlap between glint asia and emore.

I also find lots of repeated identities in emore. I am cleaning those as we speak.

mlourencoeb on 4 Jul 2019

Hello @mlourencoeb,
Thank you for fast reply. I'm confused with datasets... in their paper (@nttstar) they say: "DeepGlint-Face(including MS1M-DeepGlint and Asian-DeepGlint)". So, my questions:

is MS1M-DeepGlint is the same dataset as MS1MV2 or is it "only asians" version of celebrities dataset?
if I'm not mistaken faces_glint is the merged dataset, which includes: MS1M-DeepGlint and Asian-DeepGlint.
analysing the above... I think the faces_glint is the combination of original (semi-automatic refined) version of MS-Celeb-1M and Asian-Celeb dataset (87K + 94K = 181K). If this is true... then I think that there is no need to merge faces_emore and faces_glint. Am I right @nttstar?
(I just read this explanation: http://trillionpairs.deepglint.com/overview)
If all above is true, then... which dataset can I try to merge with faces_glint? :))

Thank you!

Talgin on 5 Jul 2019

Hello @Talgin

emore is based on MSCELEB just like non asian component of faces_glint. I would merge emore with asia part only, but I could be wrong.

mlourencoeb on 5 Jul 2019

@mlourencoeb,
Thank you!
I'm not sure but maybe faces_glint is combination of emore and asian dataset? :) But I'll try to merge them :)

Talgin on 5 Jul 2019

@zhouwei5113 have you solved your problem? I got also really low score on trillionpairs.

HuanJiML on 9 Jul 2019

@nttstar
作者你好，我想改动一个新的结构，是在SE的地方改动的，有点困惑，mxnet 的symbol，不能直接得到bchw的值，
pytorch 的SGE，一个实现架构语句，对应你提供的模型SE代码位置修改的话，symbol每一层bn3 后边的bchw,我直接得不到，我要mxnet,实现这句话，b, c, h, w = x.size()， x = x.reshape(b * self.groups, -1, h, w) 我对mxnet 不是那么熟悉，不知道作者你有没有好的方式实现这句reshape
我在frestnet.py修改的地方
bn3 = mx.sym.BatchNorm(data=conv2, fix_gamma=False, eps=2e-5, momentum=bn_mom, name=name + '_bn3')
#if use_se:
if usr_sge:
得到 bn3的 bchw
然后reshape

下面是对应pytorch 实现

class SpatialGroupEnhance(nn.Module): # 3 2 1 hw is half, 311 is same size
def __init__(self, groups = 64):
super(SpatialGroupEnhance, self).__init__()
self.groups = groups
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.weight = Parameter(torch.zeros(1, groups, 1, 1))
self.bias = Parameter(torch.ones(1, groups, 1, 1))
self.sig = nn.Sigmoid()

def forward(self, x): # (b, c, h, w)
    b, c, h, w = x.size()
    x = x.view(b * self.groups, -1, h, w)  ##reshape
    xn = x * self.avg_pool(x)  # x * global pooling(h,w change 1)
    xn = xn.sum(dim=1, keepdim=True) #(b,1,h,w)
    t = xn.view(b * self.groups, -1)  
    t = t - t.mean(dim=1, keepdim=True)  
    std = t.std(dim=1, keepdim=True) + 1e-5
    t = t / std  # normalize  -mean/std
    t = t.view(b, self.groups, h, w)
    t = t * self.weight + self.bias
    t = t.view(b * self.groups, 1, h, w)
    x = x * self.sig(t)   #in order to sigmod facter,this is group factor (0-1)
    x = x.view(b, c, h, w) #get to varying degrees of importance,Restoration dimension
    return x

shiyuanyin on 18 Jul 2019

@nttstar
本身的resnet 50 IR 结构添加SGE模块，预训练模型下载的作者的resnet50 ，glint数据
，训练测试结果是这样，变化不大，

testing verification..
(12000, 512)
infer time 7.123213
[lfw][8000]XNorm: 22.401950
[lfw][8000]Accuracy-Flip: 0.99800+-0.00287
testing verification..
(14000, 512)
infer time 8.335358
[cfp_fp][8000]XNorm: 21.203882
[cfp_fp][8000]Accuracy-Flip: 0.95300+-0.01448
testing verification..
(12000, 512)
infer time 7.040614
[agedb_30][8000]XNorm: 23.488769
[agedb_30][8000]Accuracy-Flip: 0.98000+-0.00749

shiyuanyin on 29 Jul 2019

@mlourencoeb,
Thank you!
I'm not sure but maybe faces_glint is combination of emore and asian dataset? :) But I'll try to merge them :)

any conclusion about thedataset ? Is face_glint = emore + asian_celeb？
Ihave same issue in #789

SueeH on 31 Jul 2019

Hi @nttstar ,
We are training on faces_glint + our_custom_dataset... now it's almost 10 days, and the thing I want to answer is why our accuracy is not changing, it is acc=~0.30-0.31. At the beginning loss value started from ~46.6-9 and after 2 days decreased to ~7.2-7.5, and acc was 0.0000 and began to rise, but after 20th epoch it stopped and the results you can see from the picture below. It is now 45th epoch, but nothing changed.
Our parameters are:
Loss: arcface
default.end_epoch = 1000
default.lr = 0.001
default.wd = 0.0005
default.mom = 0.9
default.per_batch_size: 64
default.ckpt = 3

network = r100

We are using 4 Tesla P100 GPU's.
You can see the progress from below screenshot:
Screenshot from 2019-08-02 16-37-06

@nttstar could you tell us what is the problem? We have merged the datasets according to your instructions with dataset_merge.py and no error happened :)

Talgin on 5 Aug 2019

Hi @SueeH ,
Sorry for late reply I think this info is noted in their paper:
Screenshot from 2019-08-05 11-27-03

They say that face_glint (DeepGlint-Face) includes MS1M-DeepGlint and Asian-DeepGlint. As far as I know and reading this (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8698884) MS1M-DeepGlint is refined version of MS1M (provided by DeepGlint Corp.) and on http://trillionpairs.deepglint.com/overview they say:

MS-Celeb-1M-v1c with 86,876 ids/3,923,399 aligned images cleaned from MS-Celeb-1M dataset. This dataset has been excluded from both LFW and Asian-Celeb.

Asian-Celeb 93,979 ids/2,830,146 aligned images. This dataset has been excluded from both LFW and MS-Celeb-1M-v1c.

So, I think that emore (MS1MV2) is another refined version of what is included into faces_glint dataset from MS1M (because MS1M-DeepGlint has 2K more ids than MS1MV2, but less images (3.9M to 5.8M)).

Talgin on 5 Aug 2019

Download dataset from http://trillionpairs.deepglint.com/data (after signup). msra is a cleaned subset of MS1M from glint while celebrity is the asian dataset.

Generate lst file by calling src/data/glint2lst.py. For example:
python glint2lst.py /data/glint_data msra,celebrity > glint.lst
or generate the asian dataset only by:
python glint2lst.py /data/glint_data celebrity > glint_cn.lst
Call face2rec2.py to generate .rec file.

Merge the dataset with existing one by calling src/data/dataset_merge.py without setting param _model_ which will combine all IDs from those two datasets.

Finally you will get a dataset contains about 180K IDs.

Use src/eval/gen_glint.py to prepare test feature file by using pretrained insightface model.

You can also post your private testing results here.

兄弟，我也上海的，MobileFaceNet+arcloss训练webface数据集或face-ms1m总是会Nan，不知道你试过没有，即便lr调成0.0001，20几轮后（epoch 等于24的时候）就Nan了。

EdwardVincentMa on 13 Nov 2019

Anyone can share configure training Asian Faces ? thanks

pake2070 on 23 Nov 2019

we use casia

在 2019年11月23日，13:28，pake2070 notifications@github.com 写道：

Anyone can share configure training Asian Faces ? thanks

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdeepinsight%2Finsightface%2Fissues%2F256%3Femail_source%3Dnotifications%26email_token%3DAN3H756D2VG6SLCL5T2DGNLQVC5ONA5CNFSM4FFA7FK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7OBEQ%23issuecomment-557768850&data=02%7C01%7C%7Cffa6c7b4012c43ba087c08d76fd5ebce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637100836877163204&sdata=Bqe8kT%2BnNyhJ9%2BDTYByIMuG7VfQVaqTeU6xrIlz6vEk%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAN3H754SY75TUPNCXT3XPL3QVC5ONANCNFSM4FFA7FKQ&data=02%7C01%7C%7Cffa6c7b4012c43ba087c08d76fd5ebce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637100836877173209&sdata=UsqN8sgjIZ0yy2oRskQPHdkANtX9NB2Iy3FEiSd8bnM%3D&reserved=0.

lennonxu0101 on 23 Nov 2019

I did step by step but get error about key image :
my configure : CUDA_VISIBLE_DEVICES='0,1' python3 -u src/train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 32 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

but get key error for asian dataset:

pake2070 on 24 Nov 2019

@Edwardmark I meet the same problem with you. Did you get good results on deepglint at last?

maywander on 11 Dec 2019

@maywander no, I didn't. At last , I use the emore data instead.

Edwardmark on 12 Dec 2019

so the models trained from emore perform better on trillionpairs test platform?@Edwardmark

maywander on 13 Dec 2019

@maywander yes, and I don't know why.

Edwardmark on 17 Dec 2019

能正常生产glint.lst文件，但是调用face2rec.py总出错，请问有人知道怎么设置参数么？谢谢

anguoyang on 10 Jan 2020

感觉代码有问题

anguoyang on 10 Jan 2020

No such file or directory: '..../insightface/src/data/property'

anguoyang on 10 Jan 2020

@nttstar I use glint dataset to train the model but only get 77% acc in the glint test, could you share your train log which can get 86% acc.

zhouyongxiu on 13 Jan 2020

How many iterations does it take to train this combined dataset from scratch using the any provided models until it converges?

cocoza4 on 12 Apr 2020

Thanks for valuate discussion, anyone has improvement in Megaface and IJBC when working in the merged dataset? Thanks

John1231983 on 17 Jun 2020

@nttstar Thanks for the great work.
Could you please share train.lst for ms1mv2?

@mlourencoeb
Could you please share the intersection list between emore and asian glint?

Thanks in advance.

aravinthmuthu on 14 Aug 2020

Insightface: Asian training dataset(from glint) discussion.

Most helpful comment

All 155 comments

!/usr/bin/env bash

Related issues