Models: ResNet pre-processing: VGG or Inception?

Created on 15 Aug 2017 · 10Comments · Source: tensorflow/models

I am working on a small project for extracting image features using pre-trained models. For this I am using the models/slim code as guideline. My code works fine for Inception and VGG models, but for ResNet (versions 1 and 2) I am constantly getting incorrect prediction results. As far as I can tell this is because the pre-processing function which I select using get_preprocessing() defined in: https://github.com/tensorflow/models/blob/master/slim/preprocessing/preprocessing_factory.py

When I use get_preprocessing for ResNet models, this returns the VGG pre-processing function, as defined in the mapping of the file above. This results in ImageNet classification results that are always incorrect. However, when I force to use Inception pre-processing for ResNet the classification results seem to be correct, at least in line with the reported classification accuracy in the paper. This behavior occurs for both ResNet v1 and v2 models.

The main page for models/slim (https://github.com/tensorflow/models/tree/master/slim) reports the following:

ResNet V2 models use Inception pre-processing and input image size of 299

However, this small sentence is easily overlooked and does not correspond to the values in the code. Also confusing is the fact that in resnet_v2.py the following line returns an incorrect image size:

resnet_v2.default_image_size = 224
...
resnet_v2_152.default_image_size = resnet_v2.default_image_size

Could someone elaborate on the confusion causes by the documentation versus code mismatch?

awaiting response

Source

tomrunia

👍8

Most helpful comment

You should use inception_preprocessing in this case, either just import it directly or just pass 'inception' to preprocessing_factory.

sguada on 18 Aug 2017

👍3 🚀1

All 10 comments

According to https://github.com/tensorflow/models/blob/master/slim/preprocessing/preprocessing_factory.py

They are still using vgg preprocess:

  preprocessing_fn_map = {
      'cifarnet': cifarnet_preprocessing,
      'inception': inception_preprocessing,
      'inception_v1': inception_preprocessing,
      'inception_v2': inception_preprocessing,
      'inception_v3': inception_preprocessing,
      'inception_v4': inception_preprocessing,
      'inception_resnet_v2': inception_preprocessing,
      'lenet': lenet_preprocessing,
      'mobilenet_v1': inception_preprocessing,
      'resnet_v1_50': vgg_preprocessing,
      'resnet_v1_101': vgg_preprocessing,
      'resnet_v1_152': vgg_preprocessing,
      'resnet_v1_200': vgg_preprocessing,
      'resnet_v2_50': vgg_preprocessing,
      'resnet_v2_101': vgg_preprocessing,
      'resnet_v2_152': vgg_preprocessing,
      'resnet_v2_200': vgg_preprocessing,
      'vgg': vgg_preprocessing,
      'vgg_a': vgg_preprocessing,
      'vgg_16': vgg_preprocessing,
      'vgg_19': vgg_preprocessing,
  }

protossw512 on 15 Aug 2017

👍2

Yes, and that mapping is incorrect as far as I can tell. The documentation explains the following.

^ ResNet V2 models use Inception pre-processing and input image size of 299 (use --preprocessing_name inception --eval_image_size 299 when using eval_image_classifier.py).

And the results you get are completely wrong when using VGG preprocessing for ResNets.

tomrunia on 16 Aug 2017

👍1

@nathansilberman @sguada can you comment or redirect? Thanks.

skye on 16 Aug 2017

You should use inception_preprocessing in this case, either just import it directly or just pass 'inception' to preprocessing_factory.

sguada on 18 Aug 2017

👍3 🚀1

Is it also the case for training? Should inception_preporcessing be used for training resent?

haamoon on 18 Aug 2017

👍1

I got the same problem when I use resnet_v2_152 model. It seems that using inception_preprocessing is correct for the case.

yekeren on 4 Feb 2018

I also got the problem with preprocessing with Resnet. With Resnet v1 model I use the vgg_preprocessing but got only around 71.5% top-1 accuracy. With Resnet v2 model, when I use the inception_preprocessing and resize the image to (299,299) then the prediction completely fail. Can someone help me with this problem?

    import numpy as np 

    import tensorflow as tf 

    from tensorflow.contrib import slim

    import sys 

    sys.path.append("/path/to/slim/models/research/slim/")

    from nets.resnet_v1 import *

    from nets.resnet_v2 import * 

    n_images = 50000

    batch_size = 100

    n_top = 1

    model = "resnet_v2_50"

    with tf.Graph().as_default():

     with slim.arg_scope(resnet_arg_scope()):

        variables_to_restore = slim.get_model_variables()

        print(variables_to_restore)

        input_string = tf.placeholder(tf.string)

        input_images = tf.read_file(input_string)

        input_images = tf.image.decode_jpeg(input_images, channels=3)

        input_images = tf.cast(input_images, tf.float32)

        if model == "resnet_v1_50":
            processed_images = vgg_preprocessing.preprocess_image(input_images, 224, 224, is_training=False)
            processed_images = tf.expand_dims(processed_images, 0)
            logits, _ = resnet_v1_50(processed_images, 
                                 num_classes=1000,
                                 is_training=False)
        elif model == "resnet_v2_50":
            processed_images = inception_preprocessing.preprocess_image(input_images, 299, 299, is_training=False)
            processed_images = tf.expand_dims(processed_images, 0)
            logits, _ = resnet_v2_50(processed_images, 
                                 num_classes=1001,
                                 is_training=False)

        probabilities = tf.nn.softmax(logits)

        init_fn = slim.assign_from_checkpoint_fn(checkpoint_file, slim.get_model_variables(model))

        config = tf.ConfigProto()
        config.gpu_options.allow_growth = True
        with tf.Session(config=config) as sess: 
            init_fn(sess)

            labels_sort = []
            success_search = 0
            import time
            start_time = time.time()
            for b in range(int(np.ceil(n_images/np.float(batch_size)))):
                start = b * batch_size + 1
                stop = np.minimum(n_images + 1, start+batch_size)
                nx = []
                for i in range(start, stop, 1): 
                    img_path = imges_path + "ILSVRC2012_val_%08d.JPEG" % (i)
                    x = imread(img_path).astype(float)

                    check_time = time.time()

                    pred = sess.run(probabilities, feed_dict={input_string:img_path})
                    x_gen = sess.run(processed_images, feed_dict={input_string:img_path})
                    if i == start:
                        preds = pred 
                    else: 
                        preds = np.concatenate([preds, pred], axis=0)
                print(np.max(x_gen), np.min(x_gen))
                labels = np.argsort(preds, axis=1)[:,::-1]

                labels_ntop = labels[:, :n_top]
                for idx in range(start, stop, 1):
                    if PIE_TRUTH[idx-1] in labels_ntop[idx-start,:]:
                        success_search += 1 
                print("Process %d images on %d query, suceess %d images, %0.2f take time %s" %\
                      (stop, n_images, success_search,100*success_search/np.float(stop), check_time - start_time))`

tuananhbui89 on 23 Jul 2018

@tomrunia @sguada Thank you for the info. Just to clarify for resnet "V1" models the preprocessing should be inception preprocessing ? (input image size : 299). the original Resnet paper used input size of 224. Can you please confirm the right method for "V1" resnet models? Thanks !!

peri044 on 2 Sep 2018

Correct, inception preprocessing would be the better choice. Feel free to reopen if any issue comes up.

wt-huang on 8 Oct 2018

@tuananhbui89 , your preprocess seems wrong
change
input_images = tf.cast(input_images, tf.float32)
to
input_images = tf.image.convert_image_dtype(input_images, tf.float32)