Models: Can't restore the given checkpoint for inception v3 in research/slim

Created on 7 Dec 2017  路  14Comments  路  Source: tensorflow/models

I am trying to use the given pretrained inception v3 model in research/slim, but the weights don't match

import tensorflow as tf
from nets.inception_v3 import inception_v3_base
tf_images = tf.placeholder(tf.float32, [None, 299, 299, 3])
inception,inception_layers=inception_v3_base(tf_images)
saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess,'models/weights/inception_v3.ckpt')
NotFoundError: Tensor name "InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/biases" not found in checkpoint files models/weights/inception_v3.ckpt
     [[Node: save/RestoreV2_30 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_30/tensor_names, save/RestoreV2_30/shape_and_slices)]]
     [[Node: save/RestoreV2_184/_5 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_384_save/RestoreV2_184", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Looking at the variables they indeed don't match

from tensorflow.python import pywrap_tensorflow
reader = pywrap_tensorflow.NewCheckpointReader('models/weights/inception_v3.ckpt')
var_to_shape_map = reader.get_variable_to_shape_map()
print(sorted(var_to_shape_map))
print([var.name for var in saver._var_list])

For the checkpoint a subsample is:

'InceptionV3/Conv2d_1a_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_1a_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_1a_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_1a_3x3/weights',
 'InceptionV3/Conv2d_2a_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_2a_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_2a_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_2a_3x3/weights',
 'InceptionV3/Conv2d_2b_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_2b_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_2b_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_2b_3x3/weights',
 'InceptionV3/Conv2d_3b_1x1/BatchNorm/beta',
 'InceptionV3/Conv2d_3b_1x1/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_3b_1x1/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_3b_1x1/weights',
 'InceptionV3/Conv2d_4a_3x3/BatchNorm/beta',
 'InceptionV3/Conv2d_4a_3x3/BatchNorm/moving_mean',
 'InceptionV3/Conv2d_4a_3x3/BatchNorm/moving_variance',
 'InceptionV3/Conv2d_4a_3x3/weights',

And for the saver:

'InceptionV3/Conv2d_1a_3x3/weights:0',
 'InceptionV3/Conv2d_1a_3x3/biases:0',
 'InceptionV3/Conv2d_2a_3x3/weights:0',
 'InceptionV3/Conv2d_2a_3x3/biases:0',
 'InceptionV3/Conv2d_2b_3x3/weights:0',
 'InceptionV3/Conv2d_2b_3x3/biases:0',
 'InceptionV3/Conv2d_3b_1x1/weights:0',
 'InceptionV3/Conv2d_3b_1x1/biases:0',
 'InceptionV3/Conv2d_4a_3x3/weights:0',
 'InceptionV3/Conv2d_4a_3x3/biases:0',

Looks like a mismatch with the biases and the batchnorms. I am using tensorflow version 1.4.0 and models commit 69cf6fca2106c41946a3c395126bdd6994d36e6b if that is relevant. The checkpoint file is taken from inception_v3_2016_08_28.tar.gz

awaiting model gardener

Most helpful comment

A better solution than the one proposed by @jlamypoirier is probably to use inception_v3_arg_scope() which can be found in slim.nets.inception:

       with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
            logits, endpoints = slim.nets.inception.inception_v3(input,
                                                         num_classes=1001,
                                                         is_training=False)

@v-shmyhlo inception_v1_arg_scope() should also work.

All 14 comments

@sguada can you resolve this?

+1

Can fix it using slim arg scope:

with slim.arg_scope([slim.conv2d, slim.fully_connected], normalizer_fn=slim.batch_norm,
                            normalizer_params={'is_training': is_training, 'updates_collections': None}):
    inception,inception_layers=inception_v3_base(tf_images)

Still, this should deserve a fix or at least some documentation.

thank you @jlamypoirier
indeed, this should deserve some documentation

+1 have the same problem on InceptionV1

A better solution than the one proposed by @jlamypoirier is probably to use inception_v3_arg_scope() which can be found in slim.nets.inception:

       with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
            logits, endpoints = slim.nets.inception.inception_v3(input,
                                                         num_classes=1001,
                                                         is_training=False)

@v-shmyhlo inception_v1_arg_scope() should also work.

+1 same problem in resnet_v2

fixed it with with slim.arg_scope(resnet_v2.resnet_arg_scope()) but I agree it should be worth mentioning somewhere in the docs

@jlamypoirier @ndrplz @Syzygy2048 @v-shmyhlo @reedwm @Steven-N-Hart I'm having the same problems since 6 six days, but I don't know which file to edit. Could you help me guy?

I want to train " mobilenet_v1_0.5_128 " or " mobilenet_v1_1.0_224 " one of these pretained models, but the config files from the tensorflow official github repository seem not to be matched to the mentioned models. Could you give me a link to the right config file Guys? Or any help is welcomed.

Thanks in advance,

Hey @Elites2017 , have you tried using args_scope with your model's specific argument as mentioned in https://github.com/tensorflow/models/issues/2977#issuecomment-355882150?

Hey @Elites2017 , have you tried using args_scope with your model's specific argument as mentioned in #2977 (comment)?

I read your comment above, but I don't know which file / field to edit. I'm using mobilenet_v1_1.0_224 with the config file: models/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config.

Thanks in advance for your help

@jlamypoirier @ndrplz @Syzygy2048 @v-shmyhlo @reedwm @Steven-N-Hart

What's the code to import model (.pb) files to TensorBoard so that I can see the right parameters to config my .pb model to .tflite?

Thanks,

I think I solved it, the code(slim) will record the num of last ckpt , so when you train other modles will lost the num. if you want train original model again, you should point to the specific file of ckpt rather than dir

Closing as the issue is resolved

A better solution than the one proposed by @jlamypoirier is probably to use inception_v3_arg_scope() which can be found in slim.nets.inception:

       with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
            logits, endpoints = slim.nets.inception.inception_v3(input,
                                                         num_classes=1001,
                                                         is_training=False)

@v-shmyhlo inception_v1_arg_scope() should also work.

thanks a lot, I was wondering how to redine the model to match the ckpt num_class, you answer solved my problem, struggling for days

Was this page helpful?
0 / 5 - 0 ratings