Really appreciate the brilliant work of insightface.
But I got some problems when training.
It does not print loss value.
Here is my called argument:
Called with argument: Namespace(batch_size=1024, beta=1000.0, beta_freeze=0, beta_min=5.0, bn_mom=0.9, ce_loss=False, ckpt=2, color=0, ctx_num=4, cutoff=0, data_dir='./train_res', easy_margin=0, emb_size=128, end_epoch=100000, fc7_lr_mult=1.0, fc7_no_bias=False, fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_size='112,112', image_w=112, images_filter=0, loss_type=4, lr=0.001, lr_steps='20000,30000,40000', margin=4, margin_a=1.0, margin_b=0.0, margin_m=0.5, margin_s=128.0, max_steps=100000, mom=0.9, network='y1', num_classes=40000, num_layers=1, per_batch_size=256, power=1.0, prefix='models/id_arcface/id_arcface', pretrained='/models/th_arcface_id/th_arcface_id,0007', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='val', use_deformable=0, verbose=2000, version_act='prelu', version_input=1, version_multiplier=1.0, version_output='E', version_se=0, version_unit=3, wd=4e-05)
As above, the key argument is
batch_size=1024, ce_loss=False, loss_type=4, network='y1'
I can't see why it doesn't print lossvalue
Here is part of the training log:
call reset()
INFO:root:Epoch[10] Batch [0-20] Speed: 441.49 samples/sec acc=0.938709
INFO:root:Epoch[10] Batch [20-40] Speed: 437.72 samples/sec acc=0.933838
INFO:root:Epoch[10] Batch [40-60] Speed: 440.15 samples/sec acc=0.931934
INFO:root:Epoch[10] Train-acc=0.932805
INFO:root:Epoch[10] Time cost=180.892
call reset()
INFO:root:Epoch[11] Batch [0-20] Speed: 444.41 samples/sec acc=0.937360
INFO:root:Epoch[11] Batch [20-40] Speed: 441.98 samples/sec acc=0.937793
INFO:root:Epoch[11] Batch [40-60] Speed: 441.07 samples/sec acc=0.931982
INFO:root:Epoch[11] Train-acc=0.934733
INFO:root:Epoch[11] Time cost=180.304
I check the issue
His argument has --version-output GDC
What does this argument means?
When I try this --version-output GDC from my former argument --version-output E
There is an error
mxnet.base.MXNetError: [04:51:33] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty/mshadow/../../src/operator/tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)) Incompatible attr in node at 0-th output: expected [128,25088], got [128,512]
But when it's --version-output E, everything goes fine.
So how to print lossvalue? Need some help.
set ce-loss= true

set ce-loss= true
Thanks!!!! @IMCGU but now it print lossvaule=nan, haha
Most helpful comment
set ce-loss= true
