Keras: How to use data (image) augmentation with fit_generator()?

Created on 23 Jul 2016 · 6Comments · Source: keras-team/keras

@fchollet and other kerasors,
We all know that if we are dealing with large scale of data such as ImageNet, we could write a customized generator which produces batch data (often as numpy.array) from disk. Then we could train our model with model.fit_generator(). But, if we want to use ImageDataGenerator to do the online data augmentation at the same time, what is the simplest way to implement? Note that I would like to use its flow() method instead of flow_from_directory() method.

stale

Source

pengpaiSH

👍2

Most helpful comment

I too want to know. I have written my own batch generator that read batches from disk (all images do not fit in memory), and want to combine this with the ImageDataGenerator.

def batch_generator(df, batch_size, path_tiles, num_classes):
    """This generator use a pandas DataFrame to read images (df.tile_name) from disk.
    """
    N = df.shape[0]
    while True:
        for start in range(0, N, batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, N)
            df_tmp = df[start:end]
            ids_batch = df_tmp.tile_name
            for id in ids_batch:
                img = cv2.imread(path_tiles+'/{}'.format(id))
                # [0] since duplicated names
                labelname=df_tmp['y'][df_tmp.tile_name == id].values[0]  
                labelname=np.asscalar(labelname)
                x_batch.append(img)
                y_batch.append(labelname)
            x_batch = np.array(x_batch, np.float32) / 255
            y_batch = utils.np_utils.to_categorical(y_batch, num_classes) 
            yield (x_batch, y_batch)

model.fit_generator(generator=batch_generator(df_train, 
                                              batch_size=batch_size,
                                              path_tiles=path_tiles,
                                              num_classes=num_classes), 
                    steps_per_epoch=len(df_train) // batch_size, 
                    epochs=epochs)

PeterStrom on 31 Jan 2018

👍5

All 6 comments

It's possible to do this by doing the following:

datagen = ImageDataGenerator(...)
train_generator = datagen.flow(X_train, Y_train, batch_size=128)
model.fit_generator(train_generator, samples_per_epoch=len(X_train), ...)

What I'd like to know is how it's possible to combine ImageDataGenerator with a custom thread-safe data augmentation generator and flow from these?

tetmin on 27 Jul 2016

👍3

@tetmin You are assuming that X_train and Y_train could be loaded into memory at once. My concern is that if not, then we have to write a customized generator which loads a batch of training data from disk. And how to combine the generator with online data augmentation.

pengpaiSH on 28 Jul 2016

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

stale[bot] on 23 May 2017

I too want to know. I have written my own batch generator that read batches from disk (all images do not fit in memory), and want to combine this with the ImageDataGenerator.

def batch_generator(df, batch_size, path_tiles, num_classes):
    """This generator use a pandas DataFrame to read images (df.tile_name) from disk.
    """
    N = df.shape[0]
    while True:
        for start in range(0, N, batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, N)
            df_tmp = df[start:end]
            ids_batch = df_tmp.tile_name
            for id in ids_batch:
                img = cv2.imread(path_tiles+'/{}'.format(id))
                # [0] since duplicated names
                labelname=df_tmp['y'][df_tmp.tile_name == id].values[0]  
                labelname=np.asscalar(labelname)
                x_batch.append(img)
                y_batch.append(labelname)
            x_batch = np.array(x_batch, np.float32) / 255
            y_batch = utils.np_utils.to_categorical(y_batch, num_classes) 
            yield (x_batch, y_batch)

model.fit_generator(generator=batch_generator(df_train, 
                                              batch_size=batch_size,
                                              path_tiles=path_tiles,
                                              num_classes=num_classes), 
                    steps_per_epoch=len(df_train) // batch_size, 
                    epochs=epochs)

PeterStrom on 31 Jan 2018

👍5

I just put this in the training loop...