I want to finetune ResNet-50 on my dataset.
But I face the problem that when one epoch end and start to run val set, it become really slow, the val time even longer than train time, I'm not sure what happened.
here is part of my code:
train_datagen = ImageDataGenerator(
rescale=1./255,
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=20, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False,
zoom_range=0.1,
channel_shift_range=0.,
fill_mode='nearest',
cval=0.,
)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
'/home/amanda/anaconda2/envs/tensorflow/lib/python2.7/site-packages/keras/datasets/nuclear/CRCHistoPhenotypes_2016_04_28/cropdetect/train',
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'/home/amanda/anaconda2/envs/tensorflow/lib/python2.7/site-packages/keras/datasets/nuclear/CRCHistoPhenotypes_2016_04_28/cropdetect/val',
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
model.fit_generator(train_generator,
# steps_per_epoch=X_train.shape[0] // batch_size,
samples_per_epoch=35946,
epochs=epochs,
validation_data=validation_generator,
verbose=1,
nb_val_samples=8986,
callbacks=[earlyStopping,saveBestModel,tensorboard])
I am having a similar problem where using flow
is considerably faster than using flow_from_directory
for both training and validation and I can't find a good reason to explain why. Would be grateful to get an insight from an expert in Keras :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I'm having the same issue. using fit_generator, my validation step is significantly longer than my training step, even though it has fewer steps.
me too facing the same problem. Validation step is very very slow.
hope someone would figure out how to handle this problem
I believe I am having a related problem. When I run fit_generator in spyder the network goes through the training and finishes the first epoch fine (I'm only training one so I can try to debug). But then the kernel dies when it tries to validate.
I’m facing the same issue. Running fit_generator and I found that validation process after each epoch is incredibly slow. I ran the same process using my model to predict the whole validation set, and it is a lot of faster (like 20x faster). Any idea about what is happening would be nice @fchollet
I had the same issue. I fixed it by following the instruction in this issues :
https://github.com/fchollet/keras/issues/6406
You have to fixed the "steps_per_epoch" and "validation_steps" parameters correctly.
In the exemple of @SIAAAAAA, i think that uncoment the line
steps_per_epoch=X_train.shape[0] // batch_size,
and setting the validaiton_steps to X_val.shap[0] // batch_size
should be enough.
It considerably improve the training time for me.
I'm having the same problem, but using a custom generator. Has anyone solved this yet?
I also faced the same problem, but by doing the following, the validation time was improved fast.
# training
hi = model.fit_generator(
train_generator,
#samples_per_epoch=n_iteration,
steps_per_epoch=nb_samples//nb_batch,
epochs=nb_epoch,
validation_data=validation_generator,
#nb_val_samples=nb_val_samples,
validation_steps=nb_val_samples//nb_batch,
callbacks=callbacks,
verbose=1
)
Having the same problem; validation very slow. My generator is correct for keras 2 and I am using a GPU. Any one have a solution?
My model is fairly simple:
flatten_1 (Flatten) (None, 64) 0
dense_1 (Dense) (None, 64) 4160
dense_2 (Dense) (None, 48) 3120
dense_3 (Dense) (None, 32) 1568
dense_4 (Dense) (None, 24) 792
dense_5 (Dense) (None, 12) 300
dense_6 (Dense) (None, 1) 13
model.compile(optimizer= "sgd", loss='mean_squared_error', metrics=['mae'])
mysteps= max(len(X_train) // batch_size, 1)
history = model.fit_generator(train_generator,
steps_per_epoch = mysteps,
epochs=30,
validation_data = valid_generator,
validation_steps = len(X_validation),
verbose=1)
This very slow when I running this on a aws ubuntu machine using a gpu. If I run the same code on a windows machine it is much faster. very strange
I met the same problem today. When len(valid_generator)==500
, it took me almost five minute to evaluate. When I change len(valid_generator)
to 20, it took me less than 20 seconds. validation_steps
and batch_size
doesn't matter, it's len(valid_generator)
that matter. Kind of wierd, I think. Because the validation time should be proportional to validation_steps
and batch_size
I found the same thing, the __len__ function on the generator needs to return a small number for the validation data generator (much smaller than for the training data generator) and then it becomes manageable. If both generators return the same length then validation is impossibly long - more than several hours in my case (I don't know how long because I never waited long enough to see!)
[Also, this seems to be only a problem if workers>0 in the fit_generator method. If I set workers=0 then validation completes fine in a short time]
I have the same problem, in all epochs, the validation calculation is slower than the epoch training part.
The two phases take the same time with a Xeon with 18 core, but validation takes 4 times the training time with intel phi architeclure (tensorflow mkl binary).
I think/suspect that model evaluation for validation calculation does not take advantage of parallelization. This could be the core of the problem. Please check.
Regards
I'm still having this problem. It seems that the fit_generator method does not pay attention to the validation_steps parameter. I have set validation_steps at 15 but it is pulling len(data_generator) batches and ignoring this parameter value. As per comment by @Neutrino3316 above, it is the len(data_generator) method that matters to keep validation time down. And if this value is not extremely low, then the validation takes forever.
Can we reopen this issue as it is not fixed!
@keelinm How to reopen this issue?
@Neutrino3316 I am not sure how to reopen it. Maybe @fchollet or one of the team can advise on how to proceed...?
Anyone know how to know the process of validation still running or the program just do nothing (freeze)?
Please reopen this issue I am still facing it on Keras 2.2.2
I have the same problem now.And I think the reason of the problem is that the speed of ImageDataGenerator to load data from disk is too slow.I test on my server to load 50 batch data which each batch have 32 images. It cost near 50 second to load these data.Maybe you can test it on your server again to ensure the root of the problem.
@justicevita Agreed, my laptop (SSD) with a damn 940 gets val steps done way faster than my workstation armed with 1080Ti, like 10 times faster.
SIGN
In my case the overall data used in validation was, in size, smaller or equal than the training one, but the validation time was far greater, so, in my case, "validation slowness" seems not related with storage speed. I keep thinking that keras (with tensorflow backend) does not take advantage of parallelisation or any acceleration. My cent
Im seeing this issue on a relatively small data set:
train size: 52102, validation size: 6512, test_size: 6512
Of images using a data set on a Keras Model (no estimator) training in both eager and not eager mode results in one epoch taking ~ 5 minutes, but validation taking close to 25 minutes on Google Collab.
I set up my data set like so:
#split the final data set into train / validation splits to use for our model.
DATASET_SIZE = len(all_image_paths)
ds = ds.repeat()
train_size = int(0.8 * DATASET_SIZE)
val_size = int(0.1 * DATASET_SIZE)
test_size = int(0.1 * DATASET_SIZE)
print("train size: " + str(train_size) + ", validation size: " + str(val_size) + ", test_size: " + str(test_size))
train_dataset = ds.take(train_size)
test_dataset = ds.skip(train_size)
val_dataset = ds.skip(val_size)
test_dataset = ds.take(test_size)
And train like so:
steps_per_epoch = int(math.floor(train_size/BATCH_SIZE))
val_steps_per_epoch = int(math.floor(val_size/BATCH_SIZE))
epochs = 5
history = model.fit(train_dataset, epochs=epochs, steps_per_epoch=steps_per_epoch, validation_data=val_dataset, validation_steps=val_steps_per_epoch)
Curious if anyone has any pointers. Thank you in advance.
I recently had the same issue and that was because my validation_steps was 100 000. Decrease it to an acceptable value ( starting at 10 then 100 then 1000 ...) has solved my problem.
Now my network validation duration is acceptable.
try to specify the validation_steps correctly .If you didn't do that or put it randomly , you will face strange lag while overwhelmed data are generated while validation , like my snippet code
concate_model.compile(loss='mean_squared_error' ,metrics={'Steer': 'mse', 'Speed':'mse '}, optimizer=Adam(learning_rate=args.learning_rate))
history = concate_model.fit_generator(batch_generator(args.data_dir,X_train_image ,X_train_Sequence,Y_train_steer,Y_train_speed, args.batch_size, True,args.samples_per_epoch),
args.samples_per_epoch,
args.nb_epoch,
max_q_size=10,
validation_data=batch_generator(args.data_dir, X_valid_image,X_valid_Sequence ,Y_valid_steer ,Y_valid_speed, args.batch_size, False,args.samples_per_epoch),
callbacks=[checkpoint],
verbose = 1,validation_steps = args.samples_per_epoch*args.batch_size*args.test_size)
Most helpful comment
Please reopen this issue I am still facing it on Keras 2.2.2