Models: ResNet pre-processing: VGG or Inception?

Created on 15 Aug 2017  路  10Comments  路  Source: tensorflow/models

I am working on a small project for extracting image features using pre-trained models. For this I am using the models/slim code as guideline. My code works fine for Inception and VGG models, but for ResNet (versions 1 and 2) I am constantly getting incorrect prediction results. As far as I can tell this is because the pre-processing function which I select using get_preprocessing() defined in: https://github.com/tensorflow/models/blob/master/slim/preprocessing/preprocessing_factory.py

When I use get_preprocessing for ResNet models, this returns the VGG pre-processing function, as defined in the mapping of the file above. This results in ImageNet classification results that are always incorrect. However, when I force to use Inception pre-processing for ResNet the classification results seem to be correct, at least in line with the reported classification accuracy in the paper. This behavior occurs for both ResNet v1 and v2 models.

The main page for models/slim (https://github.com/tensorflow/models/tree/master/slim) reports the following:

ResNet V2 models use Inception pre-processing and input image size of 299

However, this small sentence is easily overlooked and does not correspond to the values in the code. Also confusing is the fact that in resnet_v2.py the following line returns an incorrect image size:

resnet_v2.default_image_size = 224
...
resnet_v2_152.default_image_size = resnet_v2.default_image_size

Could someone elaborate on the confusion causes by the documentation versus code mismatch?

awaiting response

Most helpful comment

You should use inception_preprocessing in this case, either just import it directly or just pass 'inception' to preprocessing_factory.

All 10 comments

According to https://github.com/tensorflow/models/blob/master/slim/preprocessing/preprocessing_factory.py

They are still using vgg preprocess:

  preprocessing_fn_map = {
      'cifarnet': cifarnet_preprocessing,
      'inception': inception_preprocessing,
      'inception_v1': inception_preprocessing,
      'inception_v2': inception_preprocessing,
      'inception_v3': inception_preprocessing,
      'inception_v4': inception_preprocessing,
      'inception_resnet_v2': inception_preprocessing,
      'lenet': lenet_preprocessing,
      'mobilenet_v1': inception_preprocessing,
      'resnet_v1_50': vgg_preprocessing,
      'resnet_v1_101': vgg_preprocessing,
      'resnet_v1_152': vgg_preprocessing,
      'resnet_v1_200': vgg_preprocessing,
      'resnet_v2_50': vgg_preprocessing,
      'resnet_v2_101': vgg_preprocessing,
      'resnet_v2_152': vgg_preprocessing,
      'resnet_v2_200': vgg_preprocessing,
      'vgg': vgg_preprocessing,
      'vgg_a': vgg_preprocessing,
      'vgg_16': vgg_preprocessing,
      'vgg_19': vgg_preprocessing,
  }

Yes, and that mapping is incorrect as far as I can tell. The documentation explains the following.

^ ResNet V2 models use Inception pre-processing and input image size of 299 (use --preprocessing_name inception --eval_image_size 299 when using eval_image_classifier.py).

And the results you get are completely wrong when using VGG preprocessing for ResNets.

@nathansilberman @sguada can you comment or redirect? Thanks.

You should use inception_preprocessing in this case, either just import it directly or just pass 'inception' to preprocessing_factory.

Is it also the case for training? Should inception_preporcessing be used for training resent?

I got the same problem when I use resnet_v2_152 model. It seems that using inception_preprocessing is correct for the case.

I also got the problem with preprocessing with Resnet. With Resnet v1 model I use the vgg_preprocessing but got only around 71.5% top-1 accuracy. With Resnet v2 model, when I use the inception_preprocessing and resize the image to (299,299) then the prediction completely fail. Can someone help me with this problem?

    import numpy as np 

    import tensorflow as tf 

    from tensorflow.contrib import slim

    import sys 

    sys.path.append("/path/to/slim/models/research/slim/")

    from nets.resnet_v1 import *

    from nets.resnet_v2 import * 

    n_images = 50000

    batch_size = 100

    n_top = 1

    model = "resnet_v2_50"

    with tf.Graph().as_default():

     with slim.arg_scope(resnet_arg_scope()):

        variables_to_restore = slim.get_model_variables()

        print(variables_to_restore)

        input_string = tf.placeholder(tf.string)

        input_images = tf.read_file(input_string)

        input_images = tf.image.decode_jpeg(input_images, channels=3)

        input_images = tf.cast(input_images, tf.float32)

        if model == "resnet_v1_50":
            processed_images = vgg_preprocessing.preprocess_image(input_images, 224, 224, is_training=False)
            processed_images = tf.expand_dims(processed_images, 0)
            logits, _ = resnet_v1_50(processed_images, 
                                 num_classes=1000,
                                 is_training=False)
        elif model == "resnet_v2_50":
            processed_images = inception_preprocessing.preprocess_image(input_images, 299, 299, is_training=False)
            processed_images = tf.expand_dims(processed_images, 0)
            logits, _ = resnet_v2_50(processed_images, 
                                 num_classes=1001,
                                 is_training=False)

        probabilities = tf.nn.softmax(logits)

        init_fn = slim.assign_from_checkpoint_fn(checkpoint_file, slim.get_model_variables(model))

        config = tf.ConfigProto()
        config.gpu_options.allow_growth = True
        with tf.Session(config=config) as sess: 
            init_fn(sess)

            labels_sort = []
            success_search = 0
            import time
            start_time = time.time()
            for b in range(int(np.ceil(n_images/np.float(batch_size)))):
                start = b * batch_size + 1
                stop = np.minimum(n_images + 1, start+batch_size)
                nx = []
                for i in range(start, stop, 1): 
                    img_path = imges_path + "ILSVRC2012_val_%08d.JPEG" % (i)
                    x = imread(img_path).astype(float)

                    check_time = time.time()

                    pred = sess.run(probabilities, feed_dict={input_string:img_path})
                    x_gen = sess.run(processed_images, feed_dict={input_string:img_path})
                    if i == start:
                        preds = pred 
                    else: 
                        preds = np.concatenate([preds, pred], axis=0)
                print(np.max(x_gen), np.min(x_gen))
                labels = np.argsort(preds, axis=1)[:,::-1]

                labels_ntop = labels[:, :n_top]
                for idx in range(start, stop, 1):
                    if PIE_TRUTH[idx-1] in labels_ntop[idx-start,:]:
                        success_search += 1 
                print("Process %d images on %d query, suceess %d images, %0.2f take time %s" %\
                      (stop, n_images, success_search,100*success_search/np.float(stop), check_time - start_time))`

@tomrunia @sguada Thank you for the info. Just to clarify for resnet "V1" models the preprocessing should be inception preprocessing ? (input image size : 299). the original Resnet paper used input size of 224. Can you please confirm the right method for "V1" resnet models? Thanks !!

Correct, inception preprocessing would be the better choice. Feel free to reopen if any issue comes up.

@tuananhbui89 , your preprocess seems wrong
change
input_images = tf.cast(input_images, tf.float32)
to
input_images = tf.image.convert_image_dtype(input_images, tf.float32)

Was this page helpful?
0 / 5 - 0 ratings