Models: research/slim pre-trained weights incompatible with present code

Created on 30 Jul 2018 · 4Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using: research/slim
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OSX
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.4.1
Bazel version (if compiling from source): n/a
CUDA/cuDNN version: n/a
GPU model and memory: n/a
Exact command to reproduce: (see bellow)

Describe the problem

It appears that the pre-trained weights for some models in research/slim are not compatible with inference graphs from the present source code.

I've been trying to turn all research/slim models into frozen graphs (for visualization research in tensorflow/lucid). For each model, I downloaded the checkpoint from the table. Then I exported an inference graph, for example:

models/research/slim/export_inference_graph.py
  --alsologtostderr
  --model_name=resnet_v1_50
  --output_file=model.pb

And attempted to freeze weights from the checkpoint into the graphdef using the tensorflow freeze_graph.py script, for example:

freeze_graph.py
  --input_graph=model.pb
  --input_checkpoint=model.ckpt
  --input_binary=true
  --output_graph=frozen.pb
  --output_node_names=resnet_v1_50/predictions/Reshape_1

This is the step where I found some issues. While I had no problems downloading checkpoints or exporting inference graphs, it seems like the provided checkpoint may not be compatible with the inference graph at HEAD in some cases due to shape errors:

✅ inception_v1
✅ inception_v2
✅ inception_v3
✅ inception_v4
✅ inception_resnet_v2
❌ resnet_v1_50
- Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]
❌ resnet_v1_101
- Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]
❌ resnet_v1_152
- Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]
✅ resnet_v2_50
✅ resnet_v2_101
✅ resnet_v2_152
❌ vgg_16
- Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]
❌ vgg_19
- Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]

The failing models all appear to have had their parameters converted from caffe-trained models back in 2016:

Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google.

Since all the cases involve a 1000 vs 1001 shape error, I suspect that there was a change in the convention around ImageNet classification output since they were ported from caffe.

Possible fixes:

Provide new checkpoints (this seems like a non-trivial ask on your end, but would likely be ideal).
Point users at a particular commit they should export inference graphs from, where they were compatible.
Note the incompatibility in the table.
Remove the incompatible models from the table.

Other Thoughts:

I realize this is research code and completely understand you may not wish to fix it. Either way, I'm very grateful that you provided both the models and the checkpoints! And I'm super excited to play around with the models I was able to export.

A couple other thoughts on how you could make pre-trained models easier for people to play around with:

Consider providing pre-trained models as frozen graphs so that users can't run into these issues.
Explicitly state what the input node's value range is. (I had to infer from here that it was (-1,1).)

Source

colah

👍2

Most helpful comment

D'oh, sorry for missing that! With the --labels_offset=1 flag, everything seems to work:

✅ inception_v1
✅ inception_v2
✅ inception_v3
✅ inception_v4
✅ inception_resnet_v2
✅ resnet_v1_50
✅ resnet_v1_101
✅ resnet_v1_152
✅ resnet_v2_50
✅ resnet_v2_101
✅ resnet_v2_152
✅ vgg_16
✅ vgg_19

colah on 7 Aug 2018

👍2

All 4 comments

@colah in the Troubleshooting section of the slim README.md, you can see there is --labels_offset=1 flag.

freedomtan on 2 Aug 2018

👍1

D'oh, sorry for missing that! With the --labels_offset=1 flag, everything seems to work:

✅ inception_v1
✅ inception_v2
✅ inception_v3
✅ inception_v4
✅ inception_resnet_v2
✅ resnet_v1_50
✅ resnet_v1_101
✅ resnet_v1_152
✅ resnet_v2_50
✅ resnet_v2_101
✅ resnet_v2_152
✅ vgg_16
✅ vgg_19

colah on 7 Aug 2018

👍2

I meet an error while dealing with Google Inception v4 by running $ inceptiion_v4.py in Anaconda Environment. I have already know that I need to set "labels_offset=1". But I have not know where "labels_offset=1" shall be inserted. Please see the error details as follows.

0. My Weights

_inception-v4_weights_tf_dim_ordering_tf_kernels.h5
inception-v4_weights_tf_dim_ordering_tf_kernels_notop.h5_

1. if set num_classes=1001, I can get the correct model structure with model.summary()

model = inception_v4(num_classes=1001, dropout_prob=0.2, weights='imagenet', include_top=True)
model.summary()

2. If set num_classes=1001, I get the error while getting prediction

 model = inception_v4(num_classes=1001, dropout_prob=0.2, weights='imagenet', include_top=True)
 preds = model.predict(x)
 print('Predicted:', decode_predictions(preds))

raise ValueError('If using `weights` as imagenet with `include_top`' ValueError: If using `weights` as imagenet with `include_top` as true, `classes` should be 1000

3. If set num_classes=1000, I get the error while getting prediction

 model = inception_v4(num_classes=1000, dropout_prob=0.2, weights='imagenet', include_top=True)
 preds = model.predict(x)
 print('Predicted:', decode_predictions(preds))

_raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (1536, 1000) and (1536, 1001) are incompatible

How can I do correctly with both model.summary() and model.predict(x)? Please help me out of the error.

Notes:

I cannot use the following method to deal with the error because I do not use the bazel environment.

./bazel-bin/slim/train
--train_dir=${TRAIN_DIR}
--dataset_dir=${DATASET_DIR}
--dataset_name=imagenet
--dataset_split_name=train
--model_name=resnet_v1_50
--checkpoint_path=${CHECKPOINT_PATH}
--labels_offset=1

https://github.com/jonsafari/tensorflow-models/blob/master/slim/README.md

mikechen66 on 17 Sep 2020

I have solved the issue. It is related to the both the concept of labels offset and the class_names.txt included in the file of validation_utils.

Comment (or delete) the line that includes decode_predictions(preds).

The function of decode_predictions() in Keras is a big problem that only recognizes 1000 classes. If the class number is 10001 or others, it throw an error.

    # -print('Predicted:', decode_predictions(preds))

Add the class_names.txt that is important to recognize the class numbers.

Users need to download the file of class_names.txt

    # Open Class labels dictionary. (human readable label given ID)
    classes = eval(open('/validation_utils/class_names.txt', 'r').read())

Deduct 1 from 1001 and get 1000 class number and then run the prediction

    preds = model.predict(x)
    print("Class is: " + classes[np.argmax(preds)-1])
    print("Certainty is: " + str(preds[0][np.argmax(preds)]))

In other words, we use the concept of "labels_offset=1", but not really use it and use the np.argmax() function to do the job instead.

Cheers!

mikechen66 on 18 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Does the depth multiplier refer to the width multiplier in the mobilenet paper?

xbcReal · 3Comments

Convert .ckpt file into .pbtxt

rakashi · 3Comments

Multi-GPU can't set when use model fasterRcnn_inception_resnet_v2

chenyuZha · 3Comments

lm_1b AttributeError: 'str' object has no attribute 'decode'

airmak · 3Comments

unexpected behavior with slim.losses.add_loss

atabakd · 3Comments