slims/nets/inception_resnet_v2.py
Hi,
I am trying to build the inception_resnet_v2 model and restore the weights with the checkpoint file released (http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz). I am using the following code to build, restore and classify using the model-
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_file)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
It classifies the dog and panda incorrectly as belonging to the same(class number 918) with very very high confidence(~100%).
Is it just for the two examples or all classes are messed up? Could it be just the model is not good at this specific class?
Figured out the issue - The image needs to be pre-processed in a certain way for the model to work. Interested folks can refer to http://stackoverflow.com/questions/39582703/using-pre-trained-inception-resnet-v2-with-tensorflow/39597537#39597537 for more details. Since this resolves the problem, this issue can be closed.
Thanks
@deepakrox could you share the complete code? I have similar issue with VGG model, have applied preprocessing but the results are still not right. thanks!
@flyfj Working code attached
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
input_tensor = tf.placeholder(tf.float32, shape=(None,299,299,3), name='input_image')
scaled_input_tensor = tf.scalar_mul((1.0/255), input_tensor)
scaled_input_tensor = tf.sub(scaled_input_tensor, 0.5)
scaled_input_tensor = tf.mul(scaled_input_tensor, 2.0)
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(scaled_input_tensor, is_training=False)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_file)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
@deepakrox Hi, I use your code to classify "cropped_panda.jpg" used in tutorials/image/imagenet/classify_image.py.
And got (0.83959323, 8.7706404) (389, 389).
Additionally, I change the input as
decode_jpeg_data = tf.placeholder(tf.string)
decode_jpeg = tf.image.decode_jpeg(decode_jpeg_data, channels=3)
if decode_jpeg.dtype != tf.float32:
decode_jpeg = tf.image.convert_image_dtype(decode_jpeg, dtype=tf.float32)
image = tf.expand_dims(decode_jpeg, 0)
image = tf.image.resize_bilinear(image, [299,299], align_corners=False)
scaled_input_tensor = tf.sub(scaled_input_tensor, 0.5)
scaled_input_tensor = tf.mul(scaled_input_tensor, 2.0)
the outputs are a little difference.
(0.78367269, 8.4608421) (389, 389)
I think the official inception_resnet_v2 model use the slim/preprocess* and tf.image.decode_jpeg,
It may be better to use tf.image.decode_jpeg rather PIL.Image to be consistent with training process.
@flyfj did you figure out the issue with VGG and did it work for you. Is it possible to share the code so that I can make changes in my code for feature extraction. Thank you for the help
@deepakrox @D-X-Y
Can we directly use the following for preprocessing:
for image in sample_images:
decoded_jpeg = tf.image.decode_jpeg(image)
im = preprcess_for_train(decoded_jpeg,299,299)
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
..... #code as given by @deepakrox
@D-X-Y
Hi, I tried the same code with panda image, and also got 389, but that is not the correct label, is it? I think 169 should be the correct result ...
I'm struggling to get correct prediction with slim pretrained model, tried different models and different preprocessing, but always get 389, which I think is the wrong result, anyone has comments on this? Thanks.
EDIT:
looks like there are two ground truth labeling for imagenet? I think "panda" = 389 is correct for slim models. But for some other models, I got 169 as the correct prediction, and "ILSVRC2012_validation_ground_truth.txt" also give "panda" a 169 label.
I got similar problem. The model give the same result with high confidence for different images.
But in my case,I foget to add biases to the "up" in block35, block17 and block8:
up=tf.nn.bias_add(up,bias)
The problem disappeared after correct the issue.
Most helpful comment
@flyfj Working code attached