Models: [slim] weird result with parameter is_training

Created on 6 Sep 2016 · 11Comments · Source: tensorflow/models

I am trying to train a inception_resnet_v2 on another dataset, when I test the accuracy of the model.

with slim.arg_scope(inception_resnet_v2.inception_resnet_v2_arg_scope()):
        logits, _ = inception_resnet_v2.inception_resnet_v2(images,
                                                             num_classes=dataset.num_classes,
                                                             is_training=True)

setting is_training with True give better result on validation set, but from the code, i should have set this to False

Source

argman

Most helpful comment

When is_training=True then batch_norm uses the statistics of the batch, when it is False it uses the moving average of the statistics computed during training.

When doing fine-tuning sometimes it is better not update the moving average of the statistics, since you are not going to run for long. So try training with is_training=False and testing with also is_training=False.

sguada on 15 Sep 2016

👍9

All 11 comments

@sguada I think this problem maybe with batch_norm during training and testing, there should be a better way to solve this

argman on 8 Sep 2016

I find its strange in _get_variables_to_train, I print out all the variable names, but in batch_norm, only beta like 'InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/beta:0' is included

argman on 8 Sep 2016

It's really confusig when I try fine-tuning, can someone explain this?

argman on 9 Sep 2016

When is_training=True then batch_norm uses the statistics of the batch, when it is False it uses the moving average of the statistics computed during training.

sguada on 15 Sep 2016

👍9

@sguada , after I update my tf and cuda,cudnn, the problem goes away, so a suggestion maybe document your tf version and cuda, cudnn version in slim. Tf is now being developped by many and a bit complicated to debug.

And my tensorflow version is ed87884e50e1a50f7dc7b36dc7a7ff225442bee0， cuda 8.0, cudnn 5.1

argman on 17 Sep 2016

@sguada , in the slim implemention, you apply moving avg to all the model parameters(maybe just need for inception models ?), but in another implemention of resnet under tensorflow/models, they just apply to batch norm parameters.

argman on 17 Sep 2016

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

gunan on 8 Feb 2018

I still have the same issue, 2 years later

ilkarman on 10 Mar 2018

👍3

I still have the same issue, 2 years later

I have the same issue, 3 years later...

caiwd on 18 Apr 2019

👍2

@sguada I think this problem maybe with batch_norm during training and testing, there should be a better way to solve this

I print out all values, But'Ture' and 'False' are same.

caiwd on 18 Apr 2019

@sguada I think this problem maybe with batch_norm during training and testing, there should be a better way to solve this

I print out all values, But'Ture' and 'False' are same.

I have the same issue, 3year later. And I also print all values, they are all the same. However the result of True and False are extremely different. When is_training=False, no matter what is the input, the results are all 0 or 1.