Models: Can't restore the given checkpoint for inception v3 in research/slim

Created on 7 Dec 2017 · 14Comments · Source: tensorflow/models

I am trying to use the given pretrained inception v3 model in research/slim, but the weights don't match

import tensorflow as tf
from nets.inception_v3 import inception_v3_base
tf_images = tf.placeholder(tf.float32, [None, 299, 299, 3])
inception,inception_layers=inception_v3_base(tf_images)
saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess,'models/weights/inception_v3.ckpt')

NotFoundError: Tensor name "InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/biases" not found in checkpoint files models/weights/inception_v3.ckpt
     [[Node: save/RestoreV2_30 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_30/tensor_names, save/RestoreV2_30/shape_and_slices)]]
     [[Node: save/RestoreV2_184/_5 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_384_save/RestoreV2_184", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Looking at the variables they indeed don't match

from tensorflow.python import pywrap_tensorflow
reader = pywrap_tensorflow.NewCheckpointReader('models/weights/inception_v3.ckpt')
var_to_shape_map = reader.get_variable_to_shape_map()
print(sorted(var_to_shape_map))
print([var.name for var in saver._var_list])

For the checkpoint a subsample is:

'InceptionV3/Conv2d_1a_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_1a_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_1a_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_1a_3x3/weights',
 'InceptionV3/Conv2d_2a_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_2a_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_2a_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_2a_3x3/weights',
 'InceptionV3/Conv2d_2b_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_2b_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_2b_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_2b_3x3/weights',
 'InceptionV3/Conv2d_3b_1x1/BatchNorm/beta',
 'InceptionV3/Conv2d_3b_1x1/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_3b_1x1/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_3b_1x1/weights',
 'InceptionV3/Conv2d_4a_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_4a_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_4a_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_4a_3x3/weights',

And for the saver:

'InceptionV3/Conv2d_1a_3x3/weights:0',
 'InceptionV3/Conv2d_1a_3x3/biases:0',
 'InceptionV3/Conv2d_2a_3x3/weights:0',
 'InceptionV3/Conv2d_2a_3x3/biases:0',
 'InceptionV3/Conv2d_2b_3x3/weights:0',
 'InceptionV3/Conv2d_2b_3x3/biases:0',
 'InceptionV3/Conv2d_3b_1x1/weights:0',
 'InceptionV3/Conv2d_3b_1x1/biases:0',
 'InceptionV3/Conv2d_4a_3x3/weights:0',
 'InceptionV3/Conv2d_4a_3x3/biases:0',

Looks like a mismatch with the biases and the batchnorms. I am using tensorflow version 1.4.0 and models commit 69cf6fca2106c41946a3c395126bdd6994d36e6b if that is relevant. The checkpoint file is taken from inception_v3_2016_08_28.tar.gz

awaiting model gardener

Source

jlamypoirier

👍1

Most helpful comment

A better solution than the one proposed by @jlamypoirier is probably to use inception_v3_arg_scope() which can be found in slim.nets.inception:

       with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
            logits, endpoints = slim.nets.inception.inception_v3(input,
                                                         num_classes=1001,
                                                         is_training=False)

@v-shmyhlo inception_v1_arg_scope() should also work.

Syzygy2048 on 8 Jan 2018

👍4

All 14 comments

@sguada can you resolve this?

reedwm on 7 Dec 2017

Steven-N-Hart on 19 Dec 2017

Can fix it using slim arg scope:

with slim.arg_scope([slim.conv2d, slim.fully_connected], normalizer_fn=slim.batch_norm,
                            normalizer_params={'is_training': is_training, 'updates_collections': None}):
    inception,inception_layers=inception_v3_base(tf_images)

Still, this should deserve a fix or at least some documentation.

jlamypoirier on 20 Dec 2017

👍4

thank you @jlamypoirier
indeed, this should deserve some documentation

liuruijin17 on 27 Dec 2017

+1 have the same problem on InceptionV1

v-shmyhlo on 3 Jan 2018

A better solution than the one proposed by @jlamypoirier is probably to use inception_v3_arg_scope() which can be found in slim.nets.inception:

       with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
            logits, endpoints = slim.nets.inception.inception_v3(input,
                                                         num_classes=1001,
                                                         is_training=False)

@v-shmyhlo inception_v1_arg_scope() should also work.

Syzygy2048 on 8 Jan 2018

👍4

+1 same problem in resnet_v2

fixed it with with slim.arg_scope(resnet_v2.resnet_arg_scope()) but I agree it should be worth mentioning somewhere in the docs

ndrplz on 21 Feb 2018

@jlamypoirier @ndrplz @Syzygy2048 @v-shmyhlo @reedwm @Steven-N-Hart I'm having the same problems since 6 six days, but I don't know which file to edit. Could you help me guy?

I want to train " mobilenet_v1_0.5_128 " or " mobilenet_v1_1.0_224 " one of these pretained models, but the config files from the tensorflow official github repository seem not to be matched to the mentioned models. Could you give me a link to the right config file Guys? Or any help is welcomed.

Thanks in advance,

Elites2017 on 14 Sep 2018

Hey @Elites2017 , have you tried using args_scope with your model's specific argument as mentioned in https://github.com/tensorflow/models/issues/2977#issuecomment-355882150?

v-shmyhlo on 14 Sep 2018

Hey @Elites2017 , have you tried using args_scope with your model's specific argument as mentioned in #2977 (comment)?

I read your comment above, but I don't know which file / field to edit. I'm using mobilenet_v1_1.0_224 with the config file: models/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config.

Thanks in advance for your help

Elites2017 on 14 Sep 2018

@jlamypoirier @ndrplz @Syzygy2048 @v-shmyhlo @reedwm @Steven-N-Hart

What's the code to import model (.pb) files to TensorBoard so that I can see the right parameters to config my .pb model to .tflite?

Thanks,

Elites2017 on 15 Sep 2018

I think I solved it, the code(slim) will record the num of last ckpt , so when you train other modles will lost the num. if you want train original model again, you should point to the specific file of ckpt rather than dir

zhangbohnu on 17 Oct 2018

Closing as the issue is resolved

wt-huang on 20 Oct 2018

A better solution than the one proposed by @jlamypoirier is probably to use inception_v3_arg_scope() which can be found in slim.nets.inception:
       with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
            logits, endpoints = slim.nets.inception.inception_v3(input,
                                                         num_classes=1001,
                                                         is_training=False)
@v-shmyhlo inception_v1_arg_scope() should also work.

thanks a lot, I was wondering how to redine the model to match the ckpt num_class, you answer solved my problem, struggling for days