Keras: How to use data (image) augmentation with fit_generator()?

Created on 23 Jul 2016  路  6Comments  路  Source: keras-team/keras

@fchollet and other kerasors,
We all know that if we are dealing with large scale of data such as ImageNet, we could write a customized generator which produces batch data (often as numpy.array) from disk. Then we could train our model with model.fit_generator(). But, if we want to use ImageDataGenerator to do the online data augmentation at the same time, what is the simplest way to implement? Note that I would like to use its flow() method instead of flow_from_directory() method.

stale

Most helpful comment

I too want to know. I have written my own batch generator that read batches from disk (all images do not fit in memory), and want to combine this with the ImageDataGenerator.

def batch_generator(df, batch_size, path_tiles, num_classes):
    """This generator use a pandas DataFrame to read images (df.tile_name) from disk.
    """
    N = df.shape[0]
    while True:
        for start in range(0, N, batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, N)
            df_tmp = df[start:end]
            ids_batch = df_tmp.tile_name
            for id in ids_batch:
                img = cv2.imread(path_tiles+'/{}'.format(id))
                # [0] since duplicated names
                labelname=df_tmp['y'][df_tmp.tile_name == id].values[0]  
                labelname=np.asscalar(labelname)
                x_batch.append(img)
                y_batch.append(labelname)
            x_batch = np.array(x_batch, np.float32) / 255
            y_batch = utils.np_utils.to_categorical(y_batch, num_classes) 
            yield (x_batch, y_batch)

model.fit_generator(generator=batch_generator(df_train, 
                                              batch_size=batch_size,
                                              path_tiles=path_tiles,
                                              num_classes=num_classes), 
                    steps_per_epoch=len(df_train) // batch_size, 
                    epochs=epochs)

All 6 comments

It's possible to do this by doing the following:

datagen = ImageDataGenerator(...)
train_generator = datagen.flow(X_train, Y_train, batch_size=128)
model.fit_generator(train_generator, samples_per_epoch=len(X_train), ...)

What I'd like to know is how it's possible to combine ImageDataGenerator with a custom thread-safe data augmentation generator and flow from these?

@tetmin You are assuming that X_train and Y_train could be loaded into memory at once. My concern is that if not, then we have to write a customized generator which loads a batch of training data from disk. And how to combine the generator with online data augmentation.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

I too want to know. I have written my own batch generator that read batches from disk (all images do not fit in memory), and want to combine this with the ImageDataGenerator.

def batch_generator(df, batch_size, path_tiles, num_classes):
    """This generator use a pandas DataFrame to read images (df.tile_name) from disk.
    """
    N = df.shape[0]
    while True:
        for start in range(0, N, batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, N)
            df_tmp = df[start:end]
            ids_batch = df_tmp.tile_name
            for id in ids_batch:
                img = cv2.imread(path_tiles+'/{}'.format(id))
                # [0] since duplicated names
                labelname=df_tmp['y'][df_tmp.tile_name == id].values[0]  
                labelname=np.asscalar(labelname)
                x_batch.append(img)
                y_batch.append(labelname)
            x_batch = np.array(x_batch, np.float32) / 255
            y_batch = utils.np_utils.to_categorical(y_batch, num_classes) 
            yield (x_batch, y_batch)

model.fit_generator(generator=batch_generator(df_train, 
                                              batch_size=batch_size,
                                              path_tiles=path_tiles,
                                              num_classes=num_classes), 
                    steps_per_epoch=len(df_train) // batch_size, 
                    epochs=epochs)

I just put this in the training loop...

`

first load your batch from disk using custom functions

x= load_img (data_path,batch_numbatchSize,(batch_num+1)batchSize)
x = preprocess_img(x)

then augment your batch

for x in img_datagen.flow(x, batch_size=batchSize, seed =seed):
x=x
break

this generates one batch of augmented data

`

Is there any suggestion for this issue. I have the same of this issue

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vinayakumarr picture vinayakumarr  路  3Comments

anjishnu picture anjishnu  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

MarkVdBergh picture MarkVdBergh  路  3Comments

snakeztc picture snakeztc  路  3Comments