Hi everyone. I worked on image classification problem with resnet module and regression output on keras with tensorflow and my problem in fit_generator, I tried everything and finally I'm here.
My configuration is: Ubuntu 16 x64, python 3.5.2, CUDA 7.5, cuDNN 5.1, tensorflow 0.10, keras 1.1.0. System memory 24Gb, gfx 970 4 Gb.
When I called generator system memory is end after 150-160 epochs. I tried call simple fit_generator with one epoch:
g = util.createGeneratorEarly(X, Xf_train, Xp_train, Y, batch_size = cfg.batch_size)
model.fit_generator(g,
samples_per_epoch = cfg.spe,
nb_epoch = cfg.nb_epoch,
nb_val_samples = Xt.shape[0],
validation_data=([Xt, Xf_test, Xp_test], Yt),
callbacks = [TensorBoard(cfg.tmp_file+now), best_model,
best_model_ep, change_lr],) # best_model_ep
and I tried call it in cycle:
for e in range(cfg.nb_epoch):
g = util.createGeneratorEarly(X, Xf_train, Xp_train, Y, batch_size = cfg.batch_size)
model.fit_generator(g,
samples_per_epoch = cfg.spe,
nb_epoch = 1,
nb_val_samples = Xt.shape[0],
validation_data=([Xt, Xf_test, Xp_test], Yt),
callbacks = [TensorBoard(cfg.tmp_file+now), best_model,
best_model_ep, change_lr],) # best_model_ep
result is same. I tried delete batches and don't delete after using in generator. Tried one batch and many. Tried multithreading and multiprocessing with pickle_safe flag. But result the same — memory leak.
After each epoch, generator get more and more memory. In my case, then I start, memory used 14Gb, then in second epoch 14.2Gb, then 14.5Gb etc, after _memory error_ is detected.
Sometimes memory is get smaller about 1-2Gb. It seems system clear processes. But common trend is get all memory.
And there is a code of generator. It's simple: 3 input images, and 1 output array there X transform in ImageGenerator:
def createGeneratorEarly( X, I, Person, Y, batch_size):
while True:
# suffled indices
idx = np.random.permutation( X.shape[0])
# create image generator
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=10, #180, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, #0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, #0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False) # randomly flip images
batches = datagen.flow( X[idx], Y[idx], batch_size=batch_size, shuffle=False)
idx0 = 0
for batch in batches:
idx1 = idx0 + batch[0].shape[0]
# return [x0, x1, x2], y
yield [ batch[0], I[idx[idx0:idx1]], Person[idx[idx0:idx1]] ], batch[1]
idx0 = idx1
if idx1 >= X.shape[0]:
break
del(batches)
del(datagen)
del(idx)
I understood — all evil in datagen.flow in my case, but how to defeat it, I don't know. Help me anyone please.
I use fit_generator, memory increase to be killed.
Same here, came back today after 3 days of train only to found 7 GB of swap memory allocated
Most helpful comment
Hi everyone. I worked on image classification problem with resnet module and regression output on keras with tensorflow and my problem in
fit_generator, I tried everything and finally I'm here.My configuration is: Ubuntu 16 x64, python 3.5.2, CUDA 7.5, cuDNN 5.1, tensorflow 0.10, keras 1.1.0. System memory 24Gb, gfx 970 4 Gb.
When I called generator system memory is end after 150-160 epochs. I tried call simple fit_generator with one epoch:
and I tried call it in cycle:
result is same. I tried delete batches and don't delete after using in generator. Tried one batch and many. Tried multithreading and multiprocessing with
pickle_safeflag. But result the same — memory leak.After each epoch, generator get more and more memory. In my case, then I start, memory used 14Gb, then in second epoch 14.2Gb, then 14.5Gb etc, after _memory error_ is detected.
Sometimes memory is get smaller about 1-2Gb. It seems system clear processes. But common trend is get all memory.
And there is a code of generator. It's simple: 3 input images, and 1 output array there X transform in ImageGenerator:
I understood — all evil in
datagen.flowin my case, but how to defeat it, I don't know. Help me anyone please.