Models: How to finetune the NasNet-A model for a different dataset ?

Created on 30 Oct 2017  路  40Comments  路  Source: tensorflow/models

I have downloaded the Imagenet trained NasNet-A mobile model from https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/README.md. But there is no .meta file there, which results in no variables to restore.

How to finetune NasNet-A mobile model on a different dataset using pre-trained ImageNet model ? Could anyone please help ? Thanks in advance.

Most helpful comment

@BarretZoph, this is the command I'm using to finetune the nansnet model.

python /data/nasnet_shop_540/models-master/research/slim/train_image_classifier.py --train_dir=/data/nasnet_shop_540/train_model_540 --dataset_dir=/data/mobilenet_merge/data_tfrecord_540 --dataset_name=shopping --dataset_split=train --model_name=nasnet_mobile --checkpoint_path=/data/nasnet_shop_540/pretrain_nasnet/model.ckpt --checkpoint_exclude_scopes=NasNet-A/Logits --max_number_of_steps=1000000 --batch_size=128 --learning_rate=0.0001 --learning_rate_decay_type=exponential --save_interval_secs=240 --save_summaries_secs=240 --log_every_n_steps=100 --optimizer=rmsprop --weight_decay=0.00004 --ignore_missing_vars=True

while I ran this commnad I got the following warnings....
`WARNING:tensorflow:Variable cell_1/beginning_bn/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_5/comb_iter_0/right/separable_3x3_2/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_1/right/bn_sep_3x3_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable reduction_cell_1/comb_iter_2/right/separable_5x5_2/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_9/comb_iter_0/right/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_7/comb_iter_1/left/separable_5x5_1/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_9/comb_iter_1/right/bn_sep_3x3_2/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable reduction_cell_0/comb_iter_4/left/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_9/comb_iter_4/left/bn_sep_3x3_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_0/left/bn_sep_5x5_2/moving_mean missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_3/1x1/weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_5/comb_iter_4/left/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_1/prev_bn/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_stem_0/comb_iter_2/right/bn_sep_5x5_1/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_stem_1/comb_iter_0/left/bn_sep_5x5_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_7/comb_iter_1/left/bn_sep_5x5_1/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_7/comb_iter_0/left/bn_sep_5x5_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_6/comb_iter_0/left/bn_sep_5x5_1/moving_mean missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_1/right/separable_3x3_2/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_6/comb_iter_4/left/bn_sep_3x3_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_5/comb_iter_1/left/bn_sep_5x5_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_11/comb_iter_0/left/bn_sep_5x5_2/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_4/comb_iter_4/left/separable_3x3_2/pointwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_4/comb_iter_0/left/bn_sep_5x5_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_0/comb_iter_4/left/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_11/comb_iter_1/left/bn_sep_5x5_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_8/comb_iter_0/right/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable reduction_cell_1/comb_iter_1/right/separable_7x7_1/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_4/comb_iter_0/right/bn_sep_3x3_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_1/left/bn_sep_5x5_1/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_6/comb_iter_4/left/bn_sep_3x3_1/moving_mean missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt

WARNING:tensorflow:No Variables to restore
`

Is it because of some other reason ? Because I didn't get this kind of warning when I usedInception or Mobilenet before.

All 40 comments

I am not aware why you need a .meta file. Could you post the command line you used and other details of how you ran the model.

Thanks

@BarretZoph, this is the command I'm using to finetune the nansnet model.

python /data/nasnet_shop_540/models-master/research/slim/train_image_classifier.py --train_dir=/data/nasnet_shop_540/train_model_540 --dataset_dir=/data/mobilenet_merge/data_tfrecord_540 --dataset_name=shopping --dataset_split=train --model_name=nasnet_mobile --checkpoint_path=/data/nasnet_shop_540/pretrain_nasnet/model.ckpt --checkpoint_exclude_scopes=NasNet-A/Logits --max_number_of_steps=1000000 --batch_size=128 --learning_rate=0.0001 --learning_rate_decay_type=exponential --save_interval_secs=240 --save_summaries_secs=240 --log_every_n_steps=100 --optimizer=rmsprop --weight_decay=0.00004 --ignore_missing_vars=True

while I ran this commnad I got the following warnings....
`WARNING:tensorflow:Variable cell_1/beginning_bn/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_5/comb_iter_0/right/separable_3x3_2/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_1/right/bn_sep_3x3_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable reduction_cell_1/comb_iter_2/right/separable_5x5_2/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_9/comb_iter_0/right/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_7/comb_iter_1/left/separable_5x5_1/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_9/comb_iter_1/right/bn_sep_3x3_2/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable reduction_cell_0/comb_iter_4/left/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_9/comb_iter_4/left/bn_sep_3x3_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_0/left/bn_sep_5x5_2/moving_mean missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_3/1x1/weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_5/comb_iter_4/left/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_1/prev_bn/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_stem_0/comb_iter_2/right/bn_sep_5x5_1/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_stem_1/comb_iter_0/left/bn_sep_5x5_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_7/comb_iter_1/left/bn_sep_5x5_1/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_7/comb_iter_0/left/bn_sep_5x5_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_6/comb_iter_0/left/bn_sep_5x5_1/moving_mean missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_1/right/separable_3x3_2/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_6/comb_iter_4/left/bn_sep_3x3_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_5/comb_iter_1/left/bn_sep_5x5_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_11/comb_iter_0/left/bn_sep_5x5_2/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_4/comb_iter_4/left/separable_3x3_2/pointwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_4/comb_iter_0/left/bn_sep_5x5_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_0/comb_iter_4/left/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_11/comb_iter_1/left/bn_sep_5x5_2/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_8/comb_iter_0/right/bn_sep_3x3_1/beta missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable reduction_cell_1/comb_iter_1/right/separable_7x7_1/depthwise_weights missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_4/comb_iter_0/right/bn_sep_3x3_1/gamma missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_10/comb_iter_1/left/bn_sep_5x5_1/moving_variance missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt
WARNING:tensorflow:Variable cell_6/comb_iter_4/left/bn_sep_3x3_1/moving_mean missing in checkpoint /data/nasnet_shop_540/pretrain_nasnet/model.ckpt

WARNING:tensorflow:No Variables to restore
`

Is it because of some other reason ? Because I didn't get this kind of warning when I usedInception or Mobilenet before.

I find same issue when I try to finetune the nansnet model.

INFO:tensorflow:Restoring parameters from net_model/model.ckpt
2017-11-01 19:44:34.074748: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/moving_mean not found in checkpoint
2017-11-01 19:44:34.074748: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/FC/biases not found in checkpoint
2017-11-01 19:44:34.074857: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/Conv/weights not found in checkpoint
2017-11-01 19:44:34.074957: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key cell_0/beginning_bn/beta not found in checkpoint
2017-11-01 19:44:34.075063: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/gamma not found in checkpoint
2017-11-01 19:44:34.075153: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/beta not found in checkpoint
......

my command is
$ python train_image_classifier.py --train_dir=result/ai/nas_i/train --dataset_name=place --dataset_split_name=train --dataset_dir=./place_data --model_na me=nasnet_large --checkpoint_path=net_model/model.ckpt --checkpoint_exclude_scopes=final_layer --trainable_scopes=final_layer --max_number_of_steps=30000 --batch_size=64 --num_clones=2 --learning_rate=0.01 --weight_decay=0.0001

New checkpoints have been uploaded that fix this issue.

@BarretZoph I download the checkpoints, but it can not be read.

Caused by op u'save/RestoreV2_6', defined at:
File "train_image_classifier.py", line 574, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 564, in main
init_fn=_get_init_fn(),
File "train_image_classifier.py", line 361, in _get_init_fn
ignore_missing_vars=FLAGS.ignore_missing_vars)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 659, in assign_from_checkpoint_fn
saver = tf_saver.Saver(var_list, reshape=reshape_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1233, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1242, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1278, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 268, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3042, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1521, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

DataLossError (see above for traceback): Unable to read file (net_model/model.ckpt.index). Perhaps the file is corrupt or was produced by a newer version of TensorFlow with format changes (failed to se
ek to header entry): corrupted compressed block contents

I downloaded them and they worked. @yoobright What version of TF are you using?

I get the same error as yoobright on tensorflow 1.4.0-rc1, python 2.7

DataLossError (see above for traceback): Unable to read file (pretrained/nasnet-a_mobile_04_10_2017/model.ckpt.index). Perhaps the file is corrupt or was produced by a newer version of TensorFlow with format changes (failed to seek to header entry): corrupted compressed block contents
[[Node: load_pretrained/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_load_pretrained/Const_0_0, load_pretrained/RestoreV2_4/tensor_names, load_pretrained/RestoreV2_4/shape_and_slices)]]

@PeterLeeCuida @yoobright I just reuploaded the checkpoints, can you give it a shot again?

@vrv I redownload the model and load it, I get key not found in checkpoint error

INFO:tensorflow:Restoring parameters from net_model/model.ckpt
2017-11-02 15:50:27.244249: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/moving_variance not found in checkpoint
2017-11-02 15:50:27.244514: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn0/gamma not found in checkpoint
2017-11-02 15:50:27.245038: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/beta not found in checkpoint
2017-11-02 15:50:27.246488: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/gamma not found in checkpoint
2017-11-02 15:50:27.247144: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn0/beta not found in checkpoint
2017-11-02 15:50:27.249581: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn0/moving_variance not found in checkpoint
2017-11-02 15:50:27.250574: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn0/moving_mean not found in checkpoint
2017-11-02 15:50:27.251454: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key aux_11/aux_logits/aux_bn1/moving_mean not found in checkpoint
INFO:tensorflow:Error reported to Coordinator: , Key aux_11/aux_logits/aux_bn1/moving_variance not found in checkpoint
[[Node: save/RestoreV2_10 = RestoreV2dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Caused by op u'save/RestoreV2_10', defined at:
File "train_image_classifier.py", line 574, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 564, in main
init_fn=_get_init_fn(),
File "train_image_classifier.py", line 361, in _get_init_fn
ignore_missing_vars=FLAGS.ignore_missing_vars)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 659, in assign_from_checkpoint_fn
saver = tf_saver.Saver(var_list, reshape=reshape_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1233, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1242, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1278, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 268, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3042, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1521, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key aux_11/aux_logits/aux_bn1/moving_variance not found in checkpoint
[[Node: save/RestoreV2_10 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_10/tensor_names, save/RestoreV2_10/shape_an
d_slices)]]

the tensor name in scope aux_11/aux_logits looks like

aux_11/aux_logits/Conv/BatchNorm/beta (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/beta/ExponentialMovingAverage (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/gamma (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/gamma/ExponentialMovingAverage (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/moving_mean (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/moving_mean/ExponentialMovingAverage (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/moving_variance (DT_FLOAT) [768]
aux_11/aux_logits/Conv/BatchNorm/moving_variance/ExponentialMovingAverage (DT_FLOAT) [768]
aux_11/aux_logits/Conv/weights (DT_FLOAT) [6,6,128,768]
aux_11/aux_logits/Conv/weights/ExponentialMovingAverage (DT_FLOAT) [6,6,128,768]
aux_11/aux_logits/FC/biases (DT_FLOAT) [1001]
aux_11/aux_logits/FC/biases/ExponentialMovingAverage (DT_FLOAT) [1001]
aux_11/aux_logits/FC/weights (DT_FLOAT) [768,1001]
aux_11/aux_logits/FC/weights/ExponentialMovingAverage (DT_FLOAT) [768,1001]
aux_11/aux_logits/proj/BatchNorm/beta (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/beta/ExponentialMovingAverage (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/gamma (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/gamma/ExponentialMovingAverage (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/moving_mean (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/moving_mean/ExponentialMovingAverage (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/moving_variance (DT_FLOAT) [128]
aux_11/aux_logits/proj/BatchNorm/moving_variance/ExponentialMovingAverage (DT_FLOAT) [128]
aux_11/aux_logits/proj/weights (DT_FLOAT) [1,1,2016,128]
aux_11/aux_logits/proj/weights/ExponentialMovingAverage (DT_FLOAT) [1,1,2016,128]

that tensor name in checkpoints do not match the name defined in _build_aux_head function
I use tensorflow 1.4.0-rc1, python 2.7

@vrv: There are still issues with the loading of network (NotFoundError)

NotFoundError (see above for traceback): Tensor name "cell_17/comb_iter_1/right/separable_3x3_2/pointwise_weights" not found in checkpoint files /netscratch/siddiqui/ClassificationCNN-TF/model.ckpt.index
[[Node: save/RestoreV2_684 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_684/tensor_names, save/RestoreV2_684/shape_and_slices)]]
[[Node: save/RestoreV2_24/_2741 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5836_save/RestoreV2_24", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

@yoobright looking at this now

@shoaibahmed what model are you using and what checkpoint are you loading?

@shoaibahmed I can see

"cell_17/comb_iter_1/right/separable_3x3_2/pointwise_weights" in the large checkpoint file. Perhaps you are trying to restore the large model with the mobile checkpoint?

@yoobright we did some checkpoint tensor name surgery for the aux head variables, it should match now. Can you try again? Thanks!

@vrv Thanks, the latest checkpoints appear to load for me now without warnings. Here's my command for re-training:

python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=cifar10 \
--dataset_split_name=train \
--model_name=nasnet_mobile \
--checkpoint_path=${CHECKPOINT_PATH} \
--preprocessing_name=inception \
--clone_on_cpu=True \
--moving_average_decay=0.999 \
--checkpoint_exclude_scopes=aux_7/aux_logits/FC,final_layer/FC \
--trainable_scopes=aux_7/aux_logits/FC,final_layer/FC

@vrv: I am still getting an error. I downloaded the latest version.

NotFoundError (see above for traceback): Tensor name "cell_stem_0/comb_iter_0/left/bn_sep_5x5_2/gamma" not found in checkpoint files /netscratch/siddiqui/ClassificationCNN-TF/model.ckpt.index
[[Node: save/RestoreV2_1273 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1273/tensor_names, save/RestoreV2_1273/shape_and_slices)]]
[[Node: save/RestoreV2_475/_445 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3538_save/RestoreV2_475", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

@vrv: I am using NASNet-A_large

@shoaibahmed Can you post the command you used to run?

@BarretZoph: It's a custom code.
arg_scope = nasnet.nasnet_large_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = nasnet.build_nasnet_large(scaledInputBatchImages, is_training=options.trainModel, is_batchnorm_training=options.trainModel, num_classes=numClasses)

I have tried with training on as well as off but still encountering errors while restoring. The script works fine when reloading Inception ResNet v2.

@shoaibahmed Can you do the following?

Use tensorflow/python/tools/inspect_checkpoint in the tensorflow.git codebase and use it to inspect the checkpoint file you are loading. You should be ble to see the tensor_name "cell_stem_0/comb_iter_0/left/bn_sep_5x5_2/gamma" in there.

If not, you are probably using the wrong checkpoint for the model. If it's there, we'll need some way to reproduce the problem to debug further.

@vrv: I tested the tensor list but there was some problem with the downloaded model. I redownloaded and everything seems to be fine in that regard. I also ran the inspect_checkpoint and now able to find all of these tensors. Thanks!

@vrv new model works fine now, thanks for your great job.

@vrv I am having issues loading these models. In the mobile version:

Key cell_stem_0/comb_iter_4/left/separable_3x3_2/biases not found in checkpoint

In the large version:

Key aux_11/aux_logits/fully_connected/weights not found in checkpoint

I run checkpoint_utils.list_variables and, obviously, those variables are not listed. I am not sure if it was the change made to the base code 20 days ago.

Nothing has changed about the model or the checkpoints, so it's probably that you're somehow not using the right set of arguments, models, and checkpoints together.
Maybe https://github.com/tensorflow/models/issues/2720 is related

Thanks for the fast response @vrv.

I load other models without issues, #2720 is not related. I am using the default functions in nasnet.py to load the model: def build_nasnet_mobile(images, num_classes, is_training=True, final_endpoint=None). So I don't think the arguments are an issue, at least not the arguments of my code. It could be the error in that function.

Either the model or the checkpoints are wrong. I am using the latest commit in this repo and downloading the checkpoints from the README links (https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/README.md). The code I use to load the model is exactly the same I am using to load other checkpoints for other models (as inception). So it is unlikely the bug is in my code as it works for other models.

I see, I'm not entirely sure how others were able to finetune if there is a mismatch, but let's dig deeper.

For the mobile case: @BarretZoph it looks like biases are being added with separable_conv2d using tf.contrib.slim; should we be setting the bias_initializer to None to prevent biases from being added to the graph?

For the large case: it looks like the name of the variable in the checkpoint is

aux_11/aux_logits/FC/weights

but the model is expecting

aux_11/aux_logits/fully_connected/weights

I think there's some scope name mismatch in our checkpoint, and that we have to rewrite the checkpoint's variable names to match the expectation of 'FC". I have a small tool to do this using the TensorBundle C++ API, so I'll try to get to it later.

(Comment removed, information is wrong).

@BarretZoph pointed out to me that arg scopes are setting the right thing in both cases:

https://github.com/tensorflow/models/blob/3f5dbba947a5a0755b93ee8af9b36b464df5d49e/research/slim/nets/nasnet/nasnet.py#L111

'FC' is specified in scope for fully connected, and biases_initializer is set for separable conv2d.

So it sounds like you need to set or pass the arg scopes.

See https://github.com/tensorflow/models/blob/57014e4c7a8a5cd8bdcb836587a094c082c991fc/research/slim/nets/nets_factory.py#L132 for the function you should probably call, or an example of how you would write it yourself.

Thanks @vrv I saw the scopes in the build functions and supposed those were the only scopes for the network.

@vrv I think you changed the names of the checkpoints and now I cannot find the correct variable, now is aux_11/aux_logits/fully_connected/weights (what I didn't find before) and it should be aux_11/aux_logits/FC/weights. Can you roll back the changes?

@jorgemf I believe I did; can you try redownloading the checkpoint? I re-uploaded them about 12 hours ago.

I redownloaded it myself, ran inspect_checkpoint and I see the 'FC' scope again.

@vrv tested now and it seems to load. I only have an issue with the number of outputs, my model has a different number of classes than ImageNet:

Assign requires shapes of both tensors to match. lhs shape= [117] rhs shape= [1001]
         [[Node: save_1/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@aux_11/aux_logits/FC/biases"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/device:CPU:0"](aux_11/aux_logits/FC/biases, save_1/RestoreV2_1)]]

I just let you know in case the checkpoint has to change. I can solve the issue on my side easily.

@jorgemf I believe the slim finetuning infrastructure should show you how to handle a different number of classes. It sounds like everything is working as intended.

@vrv sorry for bothering you again. The checkpoint might be corrupted: Checksum does not match: stored 939082234 vs. calculated on the restored bytes 2293605473

I 麓m trying to finetune nasnet_mobile to the flowers dataset and i get this error:

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/checkpoints/nasnet_mobile/model.cpkt [[Node: save/RestoreV2_7 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"] (_arg_save/Const_0_0, save/RestoreV2_7/tensor_names, save/RestoreV2_7/shape_and_slices)]]

The command i have used is the next one:
python train_image_classifier.py \ --train_dir=${TRAIN_DIR} \ --dataset_dir=${DATASET_DIR} \ --dataset_name=flowers \ --dataset_split_name=train \ --model_name=nasnet_mobile \ --checkpoint_path=${CHECKPOINT_PATH}/model.cpkt \ --preprocessing_name=inception \ --clone_on_cpu=True \ --moving_average_decay=0.999 \ --checkpoint_exclude_scopes=aux_11/aux_logits,final_layer/FC \ --trainable_scopes=aux_11/aux_logits/FC,final_layer/FC

I can麓t find the mistake.

@MikelBa -> Failed to find any matching files for /tmp/checkpoints/nasnet_mobile/model.cpkt. It cannot find the checkpoints file

I see, but i don't get why it happens. If I execute

python tensorflow/python/tools/inspect_checkpoint.py --file_name=${CHECKPOINT_PATH}/model.cpkt \ --tensor_name=cell_0/comb_iter_1/left/separable_5x5_1/depthwise_weights

I get back this:

[[[[ -1.26436604e-02] [ 9.43558384e-03] [ 5.74761443e-02] ...,

So in this example the file model.ckpt has been found but when finetuning no.

I solved my problem. I was choosing the wrong checkpoints for nasnet_mobile model. I used the checkpoints from below and everything went fine.

--checkpoint_exclude_scopes=aux_7/aux_logits/FC,final_layer/FC \
--trainable_scopes=.*/aux_logits/FC,final_layer/FC \

Hi there,
I created a repo. and well documented the slim version of nasnet. You can find the repo. here.
Follow the markdown, you would be able to finetune and train from scratch nasnet on your own dataset.
Many thanks to Google for open source such a good architecture!
Fork or any comments are highly welcomed!

Yours Yeephycho

Go to this page. It may be helpful to you.

Was this page helpful?
0 / 5 - 0 ratings