Hi. I have 10 different data set, and I hope to train 10 models with different data set respectively.
Those model have the same layers, and my codes are as follows:
model_sequence = Sequential()
model_sequence.add(Masking(mask_value=2, input_shape=(input_length, input_dim)))
model_sequence.add(LSTM(150))
model_sequence.add(Dense(20))
My GPU is GTX TITAN X.I wonder can I train the 10 models at the same time using GPU? If I can, how should I write the codes?
Thank you very much for your assistance!
@fchollet @tboquet @EderSantana Can you help me?
Multi GPU support is still in progress. If the models are different (no gradient sharing) and the data is different, what is kind of parallelism you are expecting?
@farizrahman4u Thank you for your reply.Actually, I have only one GPU. What I want is training 10 models simultaneously to save the total training time. I have tried to run several code files together, but the training speed is much slower than that when I run each code file individually.
So I wonder is there any method to decrease the total training time of 10 models?
Here are some speed-ups for GPU:
The flag lib.cnmem sets the fraction of GPU memory to allocate. Then you should make sure the total allocated memory is not more than 1 (though I took the habit to make the total less than 0.75).
Even then the training time of each model would be slowed down compared to being run individually. Then you have to see how many models it's still worth to run at the same time (I would say 2 or 3), or decide to just run them sequentially.
In that case i think its best to train one after the other.
you might just want to try using the functional api to make 10 models and then compile them into one model
in_1 = Input()
lstm_1 = LSTM(...)(in_1)
out_1 = Dense(...)(lstm_1)
in_2 = Input()
lstm_2 = LSTM(...)(in_2)
out_2 = Dense(...)(lstm_2)
model_1 = Model(input=in_1, output=out_1)
model_2 = Model(input=in_2, output=out_2)
model = Model(input = [in_1, in_2], output = [out_1, out_2])
model.compile(...)
model.fit(...)
model_1.predict(...)
model_2.predict(...)
I haven't tried it, but in principle it should work. Although it might warn of disconnected inputs.
@Yingyingzhang15
Man, I'm training same model right now. It have different numbers of parameters (dimentions input and output), but it same like described @lemuriandezapada
You need different Input and different outputs. Finally, you merged it to list (list for input and list for output) and send to fit or fit_generator (need to write custom generator). I used this:
Data format
X = np.zeros(count * PixCount * RGBCount * num_out_parameters).reshape(num_out_parameters, count, RGBCount, Ysize, Xsize)
X = X.astype('float32')
Y = np.zeros(count * nb_classes * num_out_parameters).reshape(num_out_parameters, count, nb_classes)
where _num_out_parameters_ is number of dimensions (input and same outputs)
Generator call
model.fit_generator(util.flow(X, Y, batch_size = cfg.batch_size),
samples_per_epoch = cfg.spe,
nb_epoch = cfg.nb_epoch,
validation_data=([X_test for i in range(cfg.num_out_parameters)],
[Y_test[cfg.start_parameter + i] for i in range(cfg.num_out_parameters)]),
callbacks = [best_model, best_model_ep, TensorBoard(cfg.tmp_file+now)])
Flow function
def flow(X, Y, batch_size):
while 1:
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center = False, # set input mean to 0 over the dataset
samplewise_center = False, # set each sample mean to 0
featurewise_std_normalization = False, # divide inputs by std of the dataset
samplewise_std_normalization = False, # divide each input by its std
zca_whitening = False, # apply ZCA whitening
rotation_range = 5, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range = 0.05, # randomly shift images horizontally (fraction of total width)
height_shift_range = 0.05, # randomly shift images vertically (fraction of total height)
horizontal_flip = True, # randomly flip images
vertical_flip = False) # randomly flip images
Xn = []
Yn = []
for i in range(X.shape[0]):
datagen.fit(X[i])
batches = datagen.flow(X[i], Y[i], batch_size = batch_size)
for batch in batches:
Xn.append(batch[0])
Yn.append(batch[1])
break
yield Xn, Yn
And finally when you get multiple GPU's you can easily parallel this model like this:
for i in range(cfg.num_out_parameters):
# gpu:0 gpu:1 etc
with tf.device(device):
InList, OutList = rn.residual_model(...)
InL.append(InList)
OutL.append(OutList)
with tf.device('/cpu:0'):
model = Model(input=InL, output=OutL)
May be is't ideal code, I'm noob in NN but it works!
@vrodionovpro Thank you so much for your explain! :)
Only somewhat related to this topic (my apologies), but this stackoverflow question could use an answer: http://stackoverflow.com/questions/40850089/is-keras-thread-safe
@lemuriandezapada Creating a parallel model is a smart move, but this removes some randomness, which might be important when training an ensemble (which I guess is why people want this anyway). Training an ensemble can improve performance due to some heterogeneity of the models; if all (sub)models come up with exactly the same result, ensembles are useless.
The batches fed to each network will be randomized, but the same for each model, and the only symmetry-breaking you have is from the random weight initializations. This might have some nasty consequences on the distribution of the trained models. It might work 'better' if you also create $n$ reshuffled data sets, even though there will still be some correlation between the batches of different models across epochs. Maybe one would need to create an infinite batch generator?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Has there been a solution? @vrodionovpro 's response is good, but now dated and I'm not sure where the parallelism comes in. There's some loose solutions if you're okay with combining models into one, but I'd be looking for a multi-threading-esque way to train several models concurrently.
This link is also close, but it's more about maximizing one GPU than several, if it in fact possible to have one GPU split into concurrent processes like that (I'm not sure).
model_1 = Model(input=in_1, output=out_1)
model_2 = Model(input=in_2, output=out_2)
you might just want to try using the functional api to make 10 models and then compile them into one model
in_1 = Input()
lstm_1 = LSTM(...)(in_1)
out_1 = Dense(...)(lstm_1)in_2 = Input()
lstm_2 = LSTM(...)(in_2)
out_2 = Dense(...)(lstm_2)model_1 = Model(input=in_1, output=out_1)
model_2 = Model(input=in_2, output=out_2)model = Model(input = [in_1, in_2], output = [out_1, out_2])
model.compile(...)
model.fit(...)model_1.predict(...)
model_2.predict(...)I haven't tried it, but in principle it should work. Although it might warn of disconnected inputs.
This approach works. However
model_1 = Model(input=in_1, output=out_1)
model_2 = Model(input=in_2, output=out_2)
and
model_1.predict(...)
model_2.predict(...)
are not necessary. Simply use model.predict()
or model.evaluate()
instead. Keep in mind to pass data for each model individually.
Most helpful comment
@Yingyingzhang15
Man, I'm training same model right now. It have different numbers of parameters (dimentions input and output), but it same like described @lemuriandezapada
You need different Input and different outputs. Finally, you merged it to list (list for input and list for output) and send to fit or fit_generator (need to write custom generator). I used this:
Data format
where _num_out_parameters_ is number of dimensions (input and same outputs)
Generator call
Flow function
And finally when you get multiple GPU's you can easily parallel this model like this:
May be is't ideal code, I'm noob in NN but it works!