Models: Slim resnet_v1, is_training

Created on 10 Mar 2018 · 8Comments · Source: tensorflow/models

I see this has been mentioned previously in several issues however gets closed because of inactivity: https://github.com/tensorflow/models/issues/1288 & https://github.com/tensorflow/models/pull/2138#issuecomment-323225293 & https://github.com/tensorflow/models/issues/391

from tensorflow.contrib.slim.python.slim.nets import resnet_v1

If I train a model with is_training=True and then evaluate with is_training=False I get worse performance than if I evaluate with is_training=True. My notebook.

This goes against logic and the practices in this training-script and this eval-script

I have noticed the same thing for other models based off of slim such as this densenet-implementation. I get better validation results by running inference with is_training=True.

Please can someone suggest the proper practise of what to pass to a pre-trained model like resnet or densenet when training and then what to pass for inference?

For example, training and inference with is_training=True? Like in this slim-walkthrough but then what happens if the network gets dropout added?

For reference my setup:

OS:  linux
Python:  3.5.2 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Numpy:  1.14.1
Tensorflow:  1.4.0
GPU:  ['Tesla P100-PCIE-16GB', 'Tesla P100-PCIE-16GB']
CUDA Version 8.0.61
CuDNN Version  6.0.21

Source

ilkarman

Most helpful comment

@ilkarman Hi, I have the same problem when I fine-tune resnet_v1-101, I got the better performance when the is_training is True in testing. this confused me a lot. and i found some solving methods. like this,:https://stackoverflow.com/questions/42770757/tensorflow-batch-norm-does-not-work-properly-when-testing-is-training-false
https://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
i wrote a simple code using slim with batchnorm and trained it in mnist dataset ,finally, i got the normal results, the is_training is True when training and False testing, but when i fine-tune resnet-101, the is_training problem confused me again. if you fix it ,please tell us. thanks a lot.

LDOUBLEV on 17 Mar 2018

👍3 🎉2

All 8 comments

How can you evaluate with is_training=True? The test set is used to evaluate the performance of trained model and should absolutely run with is_training=False. If you evaluate with is_training=True the model parameters will be updated to fit the test set. So the performance is better.

wenmin-wu on 10 Mar 2018

@wenmin-wu Thanks for the reply! When my model using Chainer I get AUC of 0.79, I can replicate this AUC only on Tensorflow by doing inference with is_training=True. Otherwise it gets something like 0.6.

With is_training=True since I only feed in x-data, I can impact the drop-out and the batch-norm. It seems the way batch-norm is implemented is like so (https://github.com/tensorflow/models/issues/391#issuecomment-247392028):

When is_training=True then batch_norm uses the statistics of the batch, when it is False it uses the moving average of the statistics computed during training.

Also the official walkthrough does this in the last cell:

import numpy as np
import tensorflow as tf
from datasets import flowers
from nets import inception

from tensorflow.contrib import slim

image_size = inception.inception_v1.default_image_size
batch_size = 3

with tf.Graph().as_default():
    tf.logging.set_verbosity(tf.logging.INFO)

    dataset = flowers.get_split('train', flowers_data_dir)
    images, images_raw, labels = load_batch(dataset, height=image_size, width=image_size)

    # Create the model, use the default arg scope to configure the batch norm parameters.
    with slim.arg_scope(inception.inception_v1_arg_scope()):
        logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)

    probabilities = tf.nn.softmax(logits)

    checkpoint_path = tf.train.latest_checkpoint(train_dir)
    init_fn = slim.assign_from_checkpoint_fn(
      checkpoint_path,
      slim.get_variables_to_restore())

    with tf.Session() as sess:
        with slim.queues.QueueRunners(sess):
            sess.run(tf.initialize_local_variables())
            init_fn(sess)
            np_probabilities, np_images_raw, np_labels = sess.run([probabilities, images_raw, labels])

            for i in range(batch_size): 
                image = np_images_raw[i, :, :, :]
                true_label = np_labels[i]
                predicted_label = np.argmax(np_probabilities[i, :])
                predicted_name = dataset.labels_to_names[predicted_label]
                true_name = dataset.labels_to_names[true_label]

                plt.figure()
                plt.imshow(image.astype(np.uint8))
                plt.title('Ground Truth: [%s], Prediction [%s]' % (true_name, predicted_name))
                plt.axis('off')
                plt.show()

ilkarman on 12 Mar 2018

👍2

LDOUBLEV on 17 Mar 2018

👍3 🎉2

I have solve this is_training problem, the methods mentioned in the two links are work for me. you can try this.

LDOUBLEV on 17 Mar 2018

@LDOUBLEV Thanks for the links! So essentially we can either use slim.learning.create_train_op or carry on using is_training=True for testing after fine-tuning?

I don't understand how this can be an issue that is lasting years ...

ilkarman on 29 Mar 2018

👍1

The SLIM version of ResNet is the responsibility of the publishing researcher to maintain, and we encourage you to look at the Official ResNet in order to take advantage of the latest improvements. If you have further questions about proper protocol for running a research model, please consider asking a question on Stack Overflow, which is better suited for community support.

karmel on 31 Mar 2018

I have the same problem. Normally I use caffe or pytorch to train resnet (like 50) to classify objects, my validation acc is about 0.1 lower than training acc. But when I use tensorflow and slim to train resnet, the gap is about 0.7 (0.9 for train and 0.2 for validation). Then I use training dataset to validate, the interesting thing is the validation acc is not 0.9 but 0.6 or lower. (USING THE SAME DATASET) But when I turn is_training to True when validation, it goes back to normal logically (i.e. the gap between training and validation acc is closer and reasonable, and using the same training dataset to validate, it has the same acc as training). There is another phenomonen that using caffe or pytorch I can get training acc as much as 0.8, but using tensorflow and slim, it seems easily to get a 0.99+ accuracy, is this normal?

goldentimecoolk on 10 Sep 2018

@LDOUBLEV Thanks for the links! So essentially we can either use slim.learning.create_train_op or carry on using is_training=True for testing after fine-tuning?

I don't understand how this can be an issue that is lasting years ...

We can't use 'is_training = True' for testing. In my case, batchsize of the testing data makes a huge difference. When single sample is tested, the result becomes completely nonsense.