@fchollet and other kerasors,
We all know that if we are dealing with large scale of data such as ImageNet, we could write a customized generator which produces batch data (often as numpy.array) from disk. Then we could train our model with model.fit_generator(). But, if we want to use ImageDataGenerator to do the online data augmentation at the same time, what is the simplest way to implement? Note that I would like to use its flow() method instead of flow_from_directory() method.
It's possible to do this by doing the following:
datagen = ImageDataGenerator(...)
train_generator = datagen.flow(X_train, Y_train, batch_size=128)
model.fit_generator(train_generator, samples_per_epoch=len(X_train), ...)
What I'd like to know is how it's possible to combine ImageDataGenerator with a custom thread-safe data augmentation generator and flow from these?
@tetmin You are assuming that X_train and Y_train could be loaded into memory at once. My concern is that if not, then we have to write a customized generator which loads a batch of training data from disk. And how to combine the generator with online data augmentation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
I too want to know. I have written my own batch generator that read batches from disk (all images do not fit in memory), and want to combine this with the ImageDataGenerator.
def batch_generator(df, batch_size, path_tiles, num_classes):
"""This generator use a pandas DataFrame to read images (df.tile_name) from disk.
"""
N = df.shape[0]
while True:
for start in range(0, N, batch_size):
x_batch = []
y_batch = []
end = min(start + batch_size, N)
df_tmp = df[start:end]
ids_batch = df_tmp.tile_name
for id in ids_batch:
img = cv2.imread(path_tiles+'/{}'.format(id))
# [0] since duplicated names
labelname=df_tmp['y'][df_tmp.tile_name == id].values[0]
labelname=np.asscalar(labelname)
x_batch.append(img)
y_batch.append(labelname)
x_batch = np.array(x_batch, np.float32) / 255
y_batch = utils.np_utils.to_categorical(y_batch, num_classes)
yield (x_batch, y_batch)
model.fit_generator(generator=batch_generator(df_train,
batch_size=batch_size,
path_tiles=path_tiles,
num_classes=num_classes),
steps_per_epoch=len(df_train) // batch_size,
epochs=epochs)
I just put this in the training loop...
`
x= load_img (data_path,batch_numbatchSize,(batch_num+1)batchSize)
x = preprocess_img(x)
for x in img_datagen.flow(x, batch_size=batchSize, seed =seed):
x=x
break
`
Is there any suggestion for this issue. I have the same of this issue
Most helpful comment
I too want to know. I have written my own batch generator that read batches from disk (all images do not fit in memory), and want to combine this with the ImageDataGenerator.