Keras: Keras ImageGenerator: resize is done before preprocessing

Created on 11 Jan 2018 · 3Comments · Source: keras-team/keras

During training I have an accuracy of 99%. Then I test it with my own code and see an accuracy of 89%.
I reproduced the issue and this is due to a mismatch in the input. The data generated with ImageGenerator is slightly different than during my own testing. I reproduced the issue and with this script the two means are different:

import numpy as np
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
import os
import pdb
from scipy.misc import imresize

def preprocess(img):
    width, height = img.shape[0], img.shape[1]
    img = image.array_to_img(img, scale=False)

    # Crop 48x48px
    desired_width, desired_height = 48, 48

    if width < desired_width:
        desired_width = width
    start_x = np.maximum(0, int((width-desired_width)/2))

    img = img.crop((start_x, np.maximum(0, height-desired_height), start_x+desired_width, height))
    img = img.resize((48, 48))

    img = image.img_to_array(img)
    return img / 255.

datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    preprocessing_function=preprocess)

generator = datagen.flow_from_directory(
    'numbers_train', 
    target_size=(48,48),
    batch_size=1024, # Only 405 images in directory, so batch always the same
    classes=['02'],
    shuffle=False,
    class_mode='sparse')

inputs, targets = next(generator)

folder = 'numbers_train/02'
files = os.listdir(folder)
files = list(map(lambda x: os.path.join(folder, x), files))

images = []
for f in files:
    img = image.load_img(f)
    #img = img.resize((48, 48))
    img = image.img_to_array(img)
    img = preprocess(img)

    images.append(img)
inputs2 = np.asarray(images)

print(np.mean(inputs))
print(np.mean(inputs2))

This seems to be due to the img.resize in the preprocess. Commenting out this line in the preprocess and uncommenting the img.resize in the last for loop gives exactly the same means. However I want that first the cropping is done and than the resize. Now it seems that the ImageGenerator does first a resize and then the preprocessing. I think it should be better to do this the other way around. First preprocess, then resize.

Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.

Thank you!

[x] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).