Models: research/slim pre-trained weights incompatible with present code

Created on 30 Jul 2018  ยท  4Comments  ยท  Source: tensorflow/models

System information

  • What is the top-level directory of the model you are using: research/slim
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OSX
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.4.1
  • Bazel version (if compiling from source): n/a
  • CUDA/cuDNN version: n/a
  • GPU model and memory: n/a
  • Exact command to reproduce: (see bellow)

Describe the problem

It appears that the pre-trained weights for some models in research/slim are not compatible with inference graphs from the present source code.

I've been trying to turn all research/slim models into frozen graphs (for visualization research in tensorflow/lucid). For each model, I downloaded the checkpoint from the table. Then I exported an inference graph, for example:

models/research/slim/export_inference_graph.py
  --alsologtostderr
  --model_name=resnet_v1_50
  --output_file=model.pb

And attempted to freeze weights from the checkpoint into the graphdef using the tensorflow freeze_graph.py script, for example:

freeze_graph.py
  --input_graph=model.pb
  --input_checkpoint=model.ckpt
  --input_binary=true
  --output_graph=frozen.pb
  --output_node_names=resnet_v1_50/predictions/Reshape_1

This is the step where I found some issues. While I had no problems downloading checkpoints or exporting inference graphs, it seems like the provided checkpoint may not be compatible with the inference graph at HEAD in some cases due to shape errors:

  • โœ… inception_v1
  • โœ… inception_v2
  • โœ… inception_v3
  • โœ… inception_v4
  • โœ… inception_resnet_v2
  • โŒ resnet_v1_50

    • Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]

  • โŒ resnet_v1_101

    • Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]

  • โŒ resnet_v1_152

    • Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]

  • โœ… resnet_v2_50
  • โœ… resnet_v2_101
  • โœ… resnet_v2_152
  • โŒ vgg_16

    • Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]

  • โŒ vgg_19

    • Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]

The failing models all appear to have had their parameters converted from caffe-trained models back in 2016:

Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google.

Since all the cases involve a 1000 vs 1001 shape error, I suspect that there was a change in the convention around ImageNet classification output since they were ported from caffe.

Possible fixes:

  • Provide new checkpoints (this seems like a non-trivial ask on your end, but would likely be ideal).
  • Point users at a particular commit they should export inference graphs from, where they were compatible.
  • Note the incompatibility in the table.
  • Remove the incompatible models from the table.

Other Thoughts:

I realize this is research code and completely understand you may not wish to fix it. Either way, I'm very grateful that you provided both the models and the checkpoints! And I'm super excited to play around with the models I was able to export.

A couple other thoughts on how you could make pre-trained models easier for people to play around with:

  • Consider providing pre-trained models as frozen graphs so that users can't run into these issues.
  • Explicitly state what the input node's value range is. (I had to infer from here that it was (-1,1).)

Most helpful comment

D'oh, sorry for missing that! With the --labels_offset=1 flag, everything seems to work:

  • โœ… inception_v1
  • โœ… inception_v2
  • โœ… inception_v3
  • โœ… inception_v4
  • โœ… inception_resnet_v2
  • โœ… resnet_v1_50
  • โœ… resnet_v1_101
  • โœ… resnet_v1_152
  • โœ… resnet_v2_50
  • โœ… resnet_v2_101
  • โœ… resnet_v2_152
  • โœ… vgg_16
  • โœ… vgg_19

All 4 comments

@colah in the Troubleshooting section of the slim README.md, you can see there is --labels_offset=1 flag.

D'oh, sorry for missing that! With the --labels_offset=1 flag, everything seems to work:

  • โœ… inception_v1
  • โœ… inception_v2
  • โœ… inception_v3
  • โœ… inception_v4
  • โœ… inception_resnet_v2
  • โœ… resnet_v1_50
  • โœ… resnet_v1_101
  • โœ… resnet_v1_152
  • โœ… resnet_v2_50
  • โœ… resnet_v2_101
  • โœ… resnet_v2_152
  • โœ… vgg_16
  • โœ… vgg_19

I meet an error while dealing with Google Inception v4 by running $ inceptiion_v4.py in Anaconda Environment. I have already know that I need to set "labels_offset=1". But I have not know where "labels_offset=1" shall be inserted. Please see the error details as follows.

0. My Weights

_inception-v4_weights_tf_dim_ordering_tf_kernels.h5
inception-v4_weights_tf_dim_ordering_tf_kernels_notop.h5_

1. if set num_classes=1001, I can get the correct model structure with model.summary()

model = inception_v4(num_classes=1001, dropout_prob=0.2, weights='imagenet', include_top=True)
model.summary()

2. If set num_classes=1001, I get the error while getting prediction

 model = inception_v4(num_classes=1001, dropout_prob=0.2, weights='imagenet', include_top=True)
 preds = model.predict(x)
 print('Predicted:', decode_predictions(preds))
raise ValueError('If using `weights` as imagenet with `include_top`' ValueError: If using `weights` as imagenet with `include_top` as true, `classes` should be 1000

3. If set num_classes=1000, I get the error while getting prediction

 model = inception_v4(num_classes=1000, dropout_prob=0.2, weights='imagenet', include_top=True)
 preds = model.predict(x)
 print('Predicted:', decode_predictions(preds))
_raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (1536, 1000) and (1536, 1001) are incompatible

How can I do correctly with both model.summary() and model.predict(x)? Please help me out of the error.

Notes:

I cannot use the following method to deal with the error because I do not use the bazel environment.

./bazel-bin/slim/train
--train_dir=${TRAIN_DIR}
--dataset_dir=${DATASET_DIR}
--dataset_name=imagenet
--dataset_split_name=train
--model_name=resnet_v1_50
--checkpoint_path=${CHECKPOINT_PATH}
--labels_offset=1

https://github.com/jonsafari/tensorflow-models/blob/master/slim/README.md

I have solved the issue. It is related to the both the concept of labels offset and the class_names.txt included in the file of validation_utils.

  1. Comment (or delete) the line that includes decode_predictions(preds).

The function of decode_predictions() in Keras is a big problem that only recognizes 1000 classes. If the class number is 10001 or others, it throw an error.

    # -print('Predicted:', decode_predictions(preds))
  1. Add the class_names.txt that is important to recognize the class numbers.

Users need to download the file of class_names.txt

    # Open Class labels dictionary. (human readable label given ID)
    classes = eval(open('/validation_utils/class_names.txt', 'r').read())
  1. Deduct 1 from 1001 and get 1000 class number and then run the prediction
    preds = model.predict(x)
    print("Class is: " + classes[np.argmax(preds)-1])
    print("Certainty is: " + str(preds[0][np.argmax(preds)]))

In other words, we use the concept of "labels_offset=1", but not really use it and use the np.argmax() function to do the job instead.

Cheers!

Was this page helpful?
0 / 5 - 0 ratings