I am mapping sequences of vectors to corresponding sequences of vectors. The input vectors have elements in range (-1, 1) and the output vector elements are in range (0, 1). I've chosen sigmoid activations and binary_cross entropy because my outputs are interpretable as probabilities. The model (which I'm keeping as simple as I can until this seems to be working) is
model = Sequential()
model.add(LSTM(embedding_size=50, hidden_size=512, activation='sigmoid', truncate_gradient=-1, return_sequences=True))
model.add(LSTM(hidden_size=512, hidden_size=512, activation='sigmoid', truncate_gradient=-1, return_sequences=True))
model.add(LSTM(hidden_size=512, output_size=20, activation='sigmoid', truncate_gradient=-1, return_sequences=True)
model.compile(loss='binary_crossentropy', optimizer='adam')
Note that I don't expect this model to predict anything on a limited set of 10,000 points. I just want to make sure I can fit 10,000 points before trying more exotic things.
I got Theano set up to run from GPU. All good. To speed things up, it seems that increasing the batch size is the preferred strategy. This however sends the accuracy score after an epoch close to zero. On batch sizes of 25, accuracy is around 0.7 after first epoch, whereas on 100 it is around 0.004.
Any suggestions?
Thanks!
Any suggestions?
The key parameter here is the size of your training set, which you are not providing.
Larger batch sizes will indeed speed things up especially on GPU (it is required to fully exploit the GPU speedup, with small batch sizes most of the processing time is just moving batch data on and off the GPU).
Larger batch sizes also mean that you are doing less gradient updates, i.e. you will need to train for more epochs than with smaller batch sizes.
It is completely expected to have different results (at the same number of epochs) with different batch sizes. Final accuracy after convergence is reached should not differ significantly.
The key parameter here is the size of your training set, which you are not providing.
Actually
On a limited set of 10,000 points.
I thought that was quite clear.
Most helpful comment
The key parameter here is the size of your training set, which you are not providing.
Larger batch sizes will indeed speed things up especially on GPU (it is required to fully exploit the GPU speedup, with small batch sizes most of the processing time is just moving batch data on and off the GPU).
Larger batch sizes also mean that you are doing less gradient updates, i.e. you will need to train for more epochs than with smaller batch sizes.
It is completely expected to have different results (at the same number of epochs) with different batch sizes. Final accuracy after convergence is reached should not differ significantly.