It appears that the pre-trained weights for some models in research/slim are not compatible with inference graphs from the present source code.
I've been trying to turn all research/slim models into frozen graphs (for visualization research in tensorflow/lucid). For each model, I downloaded the checkpoint from the table. Then I exported an inference graph, for example:
models/research/slim/export_inference_graph.py
--alsologtostderr
--model_name=resnet_v1_50
--output_file=model.pb
And attempted to freeze weights from the checkpoint into the graphdef using the tensorflow freeze_graph.py script, for example:
freeze_graph.py
--input_graph=model.pb
--input_checkpoint=model.ckpt
--input_binary=true
--output_graph=frozen.pb
--output_node_names=resnet_v1_50/predictions/Reshape_1
This is the step where I found some issues. While I had no problems downloading checkpoints or exporting inference graphs, it seems like the provided checkpoint may not be compatible with the inference graph at HEAD in some cases due to shape errors:
Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]Assign requires shapes of both tensors to match. lhs shape= [1,1,2048,1001] rhs shape= [1,1,2048,1000]Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]The failing models all appear to have had their parameters converted from caffe-trained models back in 2016:
Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google.
Since all the cases involve a 1000 vs 1001 shape error, I suspect that there was a change in the convention around ImageNet classification output since they were ported from caffe.
Possible fixes:
Other Thoughts:
I realize this is research code and completely understand you may not wish to fix it. Either way, I'm very grateful that you provided both the models and the checkpoints! And I'm super excited to play around with the models I was able to export.
A couple other thoughts on how you could make pre-trained models easier for people to play around with:
@colah in the Troubleshooting section of the slim README.md, you can see there is --labels_offset=1 flag.
D'oh, sorry for missing that! With the --labels_offset=1 flag, everything seems to work:
I meet an error while dealing with Google Inception v4 by running $ inceptiion_v4.py in Anaconda Environment. I have already know that I need to set "labels_offset=1". But I have not know where "labels_offset=1" shall be inserted. Please see the error details as follows.
0. My Weights
_inception-v4_weights_tf_dim_ordering_tf_kernels.h5
inception-v4_weights_tf_dim_ordering_tf_kernels_notop.h5_
1. if set num_classes=1001, I can get the correct model structure with model.summary()
model = inception_v4(num_classes=1001, dropout_prob=0.2, weights='imagenet', include_top=True)
model.summary()
2. If set num_classes=1001, I get the error while getting prediction
model = inception_v4(num_classes=1001, dropout_prob=0.2, weights='imagenet', include_top=True)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds))
raise ValueError('If using `weights` as imagenet with `include_top`' ValueError: If using `weights` as imagenet with `include_top` as true, `classes` should be 1000
3. If set num_classes=1000, I get the error while getting prediction
model = inception_v4(num_classes=1000, dropout_prob=0.2, weights='imagenet', include_top=True)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds))
_raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (1536, 1000) and (1536, 1001) are incompatible
How can I do correctly with both model.summary() and model.predict(x)? Please help me out of the error.
Notes:
I cannot use the following method to deal with the error because I do not use the bazel environment.
./bazel-bin/slim/train
--train_dir=${TRAIN_DIR}
--dataset_dir=${DATASET_DIR}
--dataset_name=imagenet
--dataset_split_name=train
--model_name=resnet_v1_50
--checkpoint_path=${CHECKPOINT_PATH}
--labels_offset=1
https://github.com/jonsafari/tensorflow-models/blob/master/slim/README.md
I have solved the issue. It is related to the both the concept of labels offset and the class_names.txt included in the file of validation_utils.
The function of decode_predictions() in Keras is a big problem that only recognizes 1000 classes. If the class number is 10001 or others, it throw an error.
# -print('Predicted:', decode_predictions(preds))
Users need to download the file of class_names.txt
# Open Class labels dictionary. (human readable label given ID)
classes = eval(open('/validation_utils/class_names.txt', 'r').read())
preds = model.predict(x)
print("Class is: " + classes[np.argmax(preds)-1])
print("Certainty is: " + str(preds[0][np.argmax(preds)]))
In other words, we use the concept of "labels_offset=1", but not really use it and use the np.argmax() function to do the job instead.
Cheers!
Most helpful comment
D'oh, sorry for missing that! With the
--labels_offset=1flag, everything seems to work: