Keras: timesteps in LSTM

Created on 27 Oct 2016  Â·  8Comments  Â·  Source: keras-team/keras

I am not very clear about the timesteps in the input of RNN and LSTM. In my understanding, if we set timesteps as K, Keras will look back for K steps (So, the RNN or LSTM will be unrolled for K times) and make the prediction. Is it correct?

Also, How does LSTM in Keras unroll itself? Which parameter decides the length of TruncatedBPTT? Thanks

stale

Most helpful comment

Kinda.

Unrolling for RNNs is optional and requires more memory but increases speed. To unroll set unroll=True for a particular layer. If you don't unroll a symbolic expression will be used instead (e.g. a for-loop basically) which learns just as well but might be slower. I would not bother with this too much (leave the default).

Regarding the number of time steps, it has to do with how many steps back in time backprop uses when calculating gradients for weight updates (e.g. the matrix size) during training. Regarding making predictions, I'd expect if you saved weights trained with timesteps=1000 and reloaded them into the same model but after setting timesteps=1 you'd get the same quality of predictions because you're not changing the weights, but rather are just multiplying by the weights you have.

The number of time steps affects learning a little. High timesteps (let's say over 100) typically means convergence is slower but possibly deeper, while low timesteps (lets say around 8-32) means convergence is faster but possibly plateaus (ping @fluency03). I'm basing this off a NLP experiment recently done at my university though, and I guess it would vary for different tasks. It seems to be a pretty uninteresting hyper parameter compared to the model architecture though.

All 8 comments

Kinda.

Unrolling for RNNs is optional and requires more memory but increases speed. To unroll set unroll=True for a particular layer. If you don't unroll a symbolic expression will be used instead (e.g. a for-loop basically) which learns just as well but might be slower. I would not bother with this too much (leave the default).

Regarding the number of time steps, it has to do with how many steps back in time backprop uses when calculating gradients for weight updates (e.g. the matrix size) during training. Regarding making predictions, I'd expect if you saved weights trained with timesteps=1000 and reloaded them into the same model but after setting timesteps=1 you'd get the same quality of predictions because you're not changing the weights, but rather are just multiplying by the weights you have.

The number of time steps affects learning a little. High timesteps (let's say over 100) typically means convergence is slower but possibly deeper, while low timesteps (lets say around 8-32) means convergence is faster but possibly plateaus (ping @fluency03). I'm basing this off a NLP experiment recently done at my university though, and I guess it would vary for different tasks. It seems to be a pretty uninteresting hyper parameter compared to the model architecture though.

Thank you very much @carlthome . So, if timestep=k, the for-loop of the lSTM during learning will iterate for k times, correct?

I am also a little bit confused about how to organize the data. Say I have some data in the form of:
time 0, 1, 2, 3, 4, ....
train t0, t1, t2, t3 .....
label l0, l1, l2, l3 .....

Now if I want to set the timestep to 2, I need to reshape the data. Currently, I change my data to this form (basically duplicating each training data):
time 1, 2, 3, 4, ....
train (t0, t1), (t1, t2), (t2, t3) .....
label l1, l2, l3 ,l4 .....
Is it a right way?

Or, should I have the data in form as:
time 1, 3, 4, ....
train (t0, t1), (t2, t3)...
label l1, l3 .....
Which one is right? Sorry, I am quite new to deep learning...

Perhaps you'd be interested in watching this to clear up some of the confusion.

Thanks

On Fri, Oct 28, 2016 at 11:25 AM, Carl Thomé [email protected]
wrote:

Perhaps you'd be interested in watching this
https://www.youtube.com/watch?v=iX5V1WpxxkY to clear up some of the
confusion.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4208#issuecomment-256950411,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFRDtrewiNLmvaDT-iaUp91BmHpOg8j2ks5q4hPPgaJpZM4Kh6hx
.

Sincerely Yours
Ding Li
http://www-scf.usc.edu/~dingli/

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

I have the same confusion as @marapapman . I do not understand whether the first structure or the second structure mentioned is the correct one. Thank You !

@akhil2706 perhaps this would help clear up some of the confusion: https://www.reddit.com/r/MLQuestions/comments/76qzt5/connection_of_a_rnn_hidden_layer/dogeyld/

Thank You Carl

Was this page helpful?
0 / 5 - 0 ratings