Hi, I want to use the pretrained VGG to do other task.
I saw in other's vgg code, there is rgb to bgr transform. I don't see this transform in preprocessing in the models/slim/preprocessing/vgg_preprocessing.py.
Should there be rgb2bgr in preprocessing?
@ruotianluo We primarily use github issues to track bugs and feature requests. This is a question better suited for StackOverflow, which we also monitor. Please ask it there and tag it with the tensorflow
tag. Thanks!
@tatatodd I feel like it could be a bug, but I'm not sure.
@tatatodd Can I know if the vgg checkpoint you provide is converted from the original caffe-model? If it's converted, did you change the channel order in the first conv layer.
@ruotianluo I see.
@nathansilberman might be able to answer your question about whether the channels are rgb or bgr.
I am also interested in the answer
I compared fc7 and fc8 feature of the same image from slim and caffe. The results are very different.
I am also interested in an answer.
@ruotianluo Maybe they are different because of dropouts. Did you disable them ?
The missing explanation in the source code certainly can be considered a bug. There should be a specification of the input data
So anybody can please clarify whether the input to vgg_16
or vgg_19
be
There is no documentation or specification on slim models and pre-trained checkpoints ...
FYI: For my experiments,
inputs = tf.placeholder(tf.float32, (None, 224, 224, 3), name='inputs')
r, g, b = tf.split(axis=3, num_or_size_splits=3, value=inputs * 255.0)
VGG_MEAN = [103.939, 116.779, 123.68]
bgr = tf.concat(values=[b - VGG_MEAN[0], g - VGG_MEAN[1], r - VGG_MEAN[2]], axis=3)
fc8, endpoints = vgg_16(bgr, is_training=False)
did work.
Hi, in my tests I use vgg16 and vgg19 from slim and I do the following:
# Read the image from file
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string, channels=3) # The decoded image is in RGB format
image = tf.cast(image_decoded, tf.float32)
# Isotropic scaling
smallest_side = 256.0
height, width = tf.shape(image)[0], tf.shape(image)[1]
height = tf.to_float(height)
width = tf.to_float(width)
scale = tf.cond(tf.greater(height, width),lambda:smallest_side/width,lambda:smallest_side/height)
new_height = tf.to_int32(height * scale)
new_width = tf.to_int32(width * scale)
image = tf.image.resize_images(image, [new_height, new_width])
VGG_MEAN = [123.68, 116.78, 103.94] # This is R-G-B for Imagenet
image = tf.random_crop(image, [224, 224, 3])
means = tf.reshape(tf.constant(VGG_MEAN), [1, 1, 3])
image = image - means
The input images are in range [0,255] read in RGB format, with the RGB means from Imagenet subtracted. My results are fine so I think this preprocessing is correct.
If you want the whole code take a look at this great gist: https://gist.github.com/omoindrot/dedc857cdc0e680dfb1be99762990c9c
Hope it helps!
@simo23, Thanks for your example. I always have one question. Why we should substruct the training mean, not the testing mean? For example, if my input test image has mean value much lower than 103.94, though in the scale of [0 255], should I still use the Imagenet training mean? I notice you use rescaling to set the image size to 256, then truncate to 224. My input size is 128, can I padding it with zeros and enlarge it into 224? Or rescaling it from 128 to 256, then truncate to 224 is better?
Hi, I m sure there are people with a 1M times better answer than me but I'll give it a try:
The VGG standard mean has been obtained by zero centering the distribution of the whole training set of Imagenet. This is usually done because it is proved to improve accuracy and help the learning process.
We do not subtract the testing mean because we should not know the testing mean, you should only rely on the training data that you have.
We do not subtract each image's mean because if we subtract the official VGG mean then the features that will be extracted by the network will be the "standard" VGG ones which are proved to achieve a certain accuracy. If you subtract each image mean then the feature extracted by the network will not be the "standard" ones so your results probably will not be the same. As a practical example: if you have two images, one totally red and one totally blue, by subtracting the single image mean they will result equal after preprocessing and we do not want that. So we do not subtract the single image mean to image itself but by a mean that comes from a distribution that represents the whole dataset.
For your other more practical questions, I've not experimented anything like that before so I can only guess:
If your input size is 128 you can still put the image in input. You will obtain a 4x4 feature vector after the last conv5 layer and you can try to put this into a fully connected and see what happens. If your model complains about the input size you should modify the number of connections of the first fully connected layer. Maybe you can try not to apply the last pooling layer and obtain an 8x8 feature vector and feed it into the fully connected.
Zero padding the image to be 224 should not improve your performances as the data is still a 128x128.
You can try to resize the image to 256 by zooming it and then apply random sampling to get the 224x224. The problem is that you are modifying the image a lot, so it really depends on the application.
You have to try both :)
Hope it helps!
This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!
It is RGB. Please read the comments in this file. https://github.com/tensorflow/models/blob/master/research/slim/datasets/build_imagenet_data.py
This should be included in Slim manual.
I think it's a bug of Readme, Cause you can't use pretrain model in Slim without pretrain information.
Best Ans: https://github.com/tensorflow/models/blob/master/research/slim/preprocessing/vgg_preprocessing.py
Only subtract RGB value.
Most helpful comment
So anybody can please clarify whether the input to
vgg_16
orvgg_19
beThere is no documentation or specification on slim models and pre-trained checkpoints ...
FYI: For my experiments,
did work.