When training various models with the fit_generator function and ImageDataGenerator, I experience that the first epoch takes much longer than subsequent ones. What is the reason for this?
Epoch 1/50
1464/1464 [==============================] - 3943s - loss: 3.9262 - acc: 0.0528 - val_loss: 3.3747 - val_acc: 0.1382
Epoch 2/50
1464/1464 [==============================] - 239s - loss: 3.3120 - acc: 0.1481 - val_loss: 2.8206 - val_acc: 0.2538
Epoch 3/50
1464/1464 [==============================] - 240s - loss: 3.0171 - acc: 0.2057 - val_loss: 2.5984 - val_acc: 0.3021
Epoch 4/50
1464/1464 [==============================] - 240s - loss: 2.8254 - acc: 0.2476 - val_loss: 2.4469 - val_acc: 0.3375
What's the format of your data (i.e. what kind of files are you loading it from, how does your generator function prepare it, etc.)?
Thanks. The ImageDataGenerator is defined as
train_datagen = ImageDataGenerator(
rescale=1. / 255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
fill_mode='nearest')
and it loads the 256x256 PNG images as
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(64, 64),
batch_size=256, color_mode="grayscale",
class_mode='categorical')
The model is fitted as
history = model.fit_generator(
train_generator, steps_per_epoch=nb_train_samples // 256,
epochs=epochs, validation_data=validation_generator,
validation_steps=nb_validation_samples // 256, workers=12)
The model is fitted as
Are the images stored in memory on the first pass?
I am also experiencing the same issue with fit_generator, but I have my own data generator.
For 4096 samples:
Epoch 1/300
128/128 [==============================] - 54s - loss: 16.8168 - acc: 2.4414e-04 - val_loss: 15.8635 - val_acc: 0.0078
Epoch 2/300
128/128 [==============================] - 5s - loss: 15.0828 - acc: 0.0210 - val_loss: 15.4364 - val_acc: 0.0098
Epoch 3/300
128/128 [==============================] - 3s - loss: 14.6166 - acc: 0.0295 - val_loss: 15.2209 - val_acc: 0.0117
Epoch 4/300
128/128 [==============================] - 3s - loss: 14.3613 - acc: 0.0403 - val_loss: 15.0684 - val_acc: 0.0117
Epoch 5/300
128/128 [==============================] - 3s - loss: 14.1698 - acc: 0.0417 - val_loss: 14.9562 - val_acc: 0.0117
Epoch 6/300
128/128 [==============================] - 3s - loss: 14.0273 - acc: 0.0442 - val_loss: 14.8677 - val_acc: 0.0137
the same issue
The first epoch takes the same time, but the counter also takes into
account the time taken by building the part of the computational graph that
deals with training (a few seconds). This used to be done during the
compile step, but now it is done lazily one demand to avoid unnecessary
work. The switch happened maybe 1.5 years ago.
On 16 October 2017 at 18:57, yyydido notifications@github.com wrote:
the same issue
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/6503#issuecomment-337094459,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb798zUvY9LbGnYj_i7ggTwOspMt3ks5stAmQgaJpZM4NQi9Y
.
Thanks for the clarification!
Most helpful comment
The first epoch takes the same time, but the counter also takes into
account the time taken by building the part of the computational graph that
deals with training (a few seconds). This used to be done during the
compilestep, but now it is done lazily one demand to avoid unnecessarywork. The switch happened maybe 1.5 years ago.
On 16 October 2017 at 18:57, yyydido notifications@github.com wrote: