Vision: Origin of the means and stds used for preprocessing?

Created on 9 Oct 2019 · 17Comments · Source: pytorch/vision

Does anyone remember how exactly we came about the channel means and stds we use for the preprocessing?

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

I think the first mention of the preprocessing in this repo is in #39. In that issue @soumith points to https://github.com/pytorch/examples/tree/master/imagenet for reference. If you look at the history of main.py the commit pytorch/examples@27e2a46c1d1505324032b1d94fc6ce24d5b67e97 first introduced the values. Unfortunately it contains no explanation, hence my question.

Specifically, I'm seeking answers to the following questions:

Are these values rounded, floored, or even ceiled?
Did we use only the images in the training set of ImageNet or additionally the images of the validation set?
Did we perform any kind of resizing or cropping on each image before the calculations were performed?

I've tested some combinations and will post my results here.

| Parameters | mean | std |
| --- | --- | --- |
| train set only, no resizing / cropping| [0.4803, 0.4569, 0.4083] | [0.2806, 0.2736, 0.2877] |
| train set only, resize to 256 and center crop to 224 | [0.4845, 0.4541, 0.4025] | [0.2724, 0.2637, 0.2761] |
| train set only, center crop to 224 | [0.4701, 0.4340, 0.3832] | [0.2845, 0.2733, 0.2805] |

While the means match fairly well, the std differ significantly.

Source

pmeier

👍1

Most helpful comment

You need to go deeper ;)

https://github.com/facebook/fb.resnet.torch/blob/master/datasets/imagenet.lua

-- Computed from random subset of ImageNet training images
local meanstd = {
   mean = { 0.485, 0.456, 0.406 },
   std = { 0.229, 0.224, 0.225 },
}

nizhib on 10 Oct 2019

👍6 😄1

All 17 comments

You need to go deeper ;)

https://github.com/facebook/fb.resnet.torch/blob/master/datasets/imagenet.lua

-- Computed from random subset of ImageNet training images
local meanstd = {
   mean = { 0.485, 0.456, 0.406 },
   std = { 0.229, 0.224, 0.225 },
}

nizhib on 10 Oct 2019

👍6 😄1

For my project I need to know the covariances between the channels. Since they are not part of the current implementation, my hope was that I can calculate them myself if I know the necessary images and processing. Unfortunately

random subset

gives me little hope that I'm able to do that. I suppose no one remembers how this random subset was selected?

Should we investigate this further? I'm a little anxious that we simply use this normalization for all our models without being able to reproduce it.

pmeier on 11 Oct 2019

@colesbury do you have more information here to clarify on the mean / std for imagenet that we use?

fmassa on 14 Oct 2019

afaik we calculated the mean / std to use by running one pass on the training set of Imagenet

soumith on 15 Oct 2019

👍1

that being said, i see that std is not matching. possibly a bug of the past or some detail that we completely forgot about :-/

soumith on 15 Oct 2019

Can we put batch normalization layer before input so that mean/std will be computed automatically in the training time?

apple2373 on 20 Oct 2019

@apple2373 We currently implementing the transforms for tensors in order to be able to use them within a model (see #1375). Whether we want to include them within the models is AFAIK still up for discussion (see #782)

pmeier on 21 Oct 2019

@fmassa @soumith

Any update on this? Do we investigate further or keep it as is?

pmeier on 21 Oct 2019

@pmeier I don't know if we will ever be able to get back those numbers, given that they seem to have been computed on a randomly-sampled part of the dataset.

If we really want to see if this has any impact, we would run multiple runs of end-to-end training with the new mean/std and see if it brings any noticeable improvement.

fmassa on 21 Oct 2019

I don't think we get significant improvement (or decline) of performance. I just think we shouldn't use numbers that are not reproducible. A change like this is of course a lot of work, BC breaking etc, but we don't know what the future brings. Maybe this becomes significant in the future and than its even harder to correct.

pmeier on 21 Oct 2019

don't think we get significant improvement (or decline) of performance. I just think we shouldn't use numbers that are not reproducible.

I agree. But given the scale of how things would break with such a change, I think we should just live with it for now, and maybe document somewhere the findings you have shown in here.

fmassa on 21 Oct 2019

It's been almost four years, so I don't remember, but I probably just used the mean / std from the previous Lua ImageNet training script:

https://github.com/soumith/imagenet-multiGPU.torch/blob/deb5466a16e54ec7a69fe027e5fbcd3c1bfb49cc/donkey.lua#L161-L187

It uses the average standard deviation of an individual image's channel instead of the an estimate of the standard deviation across the entire dataset.

I don't think we should change the mean/std, nor do I see any reproducibility issue. The scientific result here is the neural network, not mean/std values. Especially since the exact choice does not matter as long as they approximately whiten the input.

colesbury on 21 Oct 2019

A change like this is of course a lot of work, BC breaking etc, but we don't know what the future brings.

These numbers have become a standard for most neural networks created so far, it's not just a lot of work — one need to retrain hundreds of neural networks (approx. 2 gpu x week each for a model like resnet50) and create pull requests for all the pretrainedmodels/dpn/wide resnets/etc. repos all over the github just to adjust normalizing std by 0.05. What the future can justify this?

nizhib on 23 Oct 2019

👍2

Following the discussion that we had in here, I agree with @colesbury and @nizhib points above.

@pmeier would you like to send a PR adding some summary of the discussion that we had here, including @colesbury comment on how those numbers were obtained?

fmassa on 25 Oct 2019

👍1

I'm covered for the next weeks. This will take some time.

pmeier on 28 Oct 2019

👍1

Maybe the reason why the stds don't match is that it was originally called with unbiased=False?