Keras: How to fit after flow_from_directory?

Created on 3 Jul 2017  路  10Comments  路  Source: keras-team/keras

Regarding this code

```# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=90.,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2)
image_datagen = ImageDataGenerator(data_gen_args)
mask_datagen = ImageDataGenerator(
data_gen_args)

Provide the same seed and keyword arguments to the fit and flow methods

seed = 1
image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)

image_generator = image_datagen.flow_from_directory(
'data/images',
class_mode=None,
seed=seed)

mask_generator = mask_datagen.flow_from_directory(
'data/masks',
class_mode=None,
seed=seed)

combine generators into one which yields image and masks

train_generator = zip(image_generator, mask_generator)

model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50)

I tried to use the flow_from_directory method but i get this warning:

UserWarning: This ImageDataGenerator specifies featurewise_std_normalization, but it hasn'tbeen fit on any training data. Fit it first by calling .fit(numpy_data).
```

I also declared some data augmentation and preprocessing, but since i have no acces to a numpy array because i use the flow_from_directory method, how can i also call the fit method, and on what?

Besides, in your code you conveniently use the images variable which is not declared! If the example should show how to read data from directory, why at this line image_datagen.fit(images, augment=True, seed=seed) you seem to have the data in memory already?

Most helpful comment

In the flow_from_directory method, the normalization is configured to apply to a batch of inputs, and you cannot manipulate a numpy array in that method. You will have to manually standardize each input x in the API provided.

You can just inherit from the ImageDataGenerator class and override the function standardize to fit your data properly.
Here's the snippet that will remove the prompted warning:

class FixedImageDataGenerator(ImageDataGenerator):
    def standardize(self, x):
        if self.featurewise_center:
            x = ((x/255.) - 0.5) * 2.
        return x

Now call this inherited class method instead

image_datagen = FixedImageDataGenerator(**data_gen_args)
mask_datagen = FixedImageDataGenerator(**data_gen_args)

All 10 comments

In the flow_from_directory method, the normalization is configured to apply to a batch of inputs, and you cannot manipulate a numpy array in that method. You will have to manually standardize each input x in the API provided.

You can just inherit from the ImageDataGenerator class and override the function standardize to fit your data properly.
Here's the snippet that will remove the prompted warning:

class FixedImageDataGenerator(ImageDataGenerator):
    def standardize(self, x):
        if self.featurewise_center:
            x = ((x/255.) - 0.5) * 2.
        return x

Now call this inherited class method instead

image_datagen = FixedImageDataGenerator(**data_gen_args)
mask_datagen = FixedImageDataGenerator(**data_gen_args)

I also don't understand the answer to this question. In this example what are Images and Masks? And I guess more basically, why is it necessary to call:

image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)

Is this only because of the featurewise_standardization is set to true?

I guess more basically, why is it necessary to call:

image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)

Is this only because of the featurewise_standardization is set to true?

In this case yes, because fit computes some statistics that will be applied in flow.

Having the same issue.

I would like to use:

ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
zca_whitening=True)

What would be the proper way to implement this with flow_from_directory?

Thanks,
Mario

I created this function get_img_fit_flow for easily normalizing based on sampling the batches. https://github.com/smileservices/keras_utils/blob/master/utils.py

Apologies if this is obvious to everyone else, but I don't see how this code:

class FixedImageDataGenerator(ImageDataGenerator):
    def standardize(self, x):
        if self.featurewise_center:
            x = ((x/255.) - 0.5) * 2.
        return x

mean centers the input images.

Nor do I see any mean-centering in the above get_img_fit_flow code.

It seems to me that mean-centering images using 'flow_from_directory' would be fairly common and yet I can't find any code in the wild that actually does it.

What am I missing?

You can first pre-compute the statistics (outside Keras). After this, use the computed statistics to pre-process input (e.g. Feature Wise Center) by overriding the the default ImageDataGenerator as specified. Of course, you'll need to modify the above code.

  • Change the 0.5 to the actual computed mean statistics.

  • Change the formula above, as feature_wise_center (as also implemented in keras) is normally implemented as x = x - mean

An alternative without Inheriting and overriding ImageDataGenerator is implement a preprocessing_function and pass it as one of the parameters in ImageDataGenerator instantiation.
See the Keras docs (https://keras.io/preprocessing/image/) for the preprocessing_function parameter in ImageDataGenerator for more details.

I have had ImageDataGenerator implemented with featurewise_center=True, featurewise_std_normalization=True. I fit the generator to my training set (it learns some statistics) and train my model. All is well.

After training is done and python is closed, how do I do the same preprocessing on my test set using the generator? I want to carry over the statistics learned from fit to be used whenever I want to test or further train my model (where the original training set might not be available anymore).

Thanks in advance

Thanks for the idea @dlpbc !
For using the learned statistics of training data from ImageDataGenerator, the standardize method from ImageDataGenerator can normalize the testing data.

Ref: Answered by Martin Thoma in Stack Overflow

Is there a way to set the preprocessing_function after the generator flow_from_directory has been created?

Would simply gen.preprocessing_function = my_function be enough (assuming gen is my instance of the generator) or does keras do something else in the background that we should replicate?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

KeironO picture KeironO  路  3Comments

oweingrod picture oweingrod  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

amityaffliction picture amityaffliction  路  3Comments