Keras: reintroducing truncated gradient BPTT

Created on 25 Oct 2016  路  5Comments  路  Source: keras-team/keras

truncated_gradient was removed from keras. it is supported in Theano.
when you have very long sequences and you want a long memory, truncating the gradient in BPTT can be essential to performance.

@fchollet expressed in several places that this can be done using the stateful rnn. however after working on it for the last week, i find this quite non-trivial. splitting the data to short sequences introduces many edge cases, and with sequences of varying length you have to implement the fit and evaluate loops yourself to control the state reset (see also https://github.com/fchollet/keras/issues/4185 )

stale

Most helpful comment

I am curious that why truncated_gradient is removed.

All 5 comments

How do you propose getting it into Keras' code base? I'm very curious! Totally agree that stateful RNNs require a lot of fiddling.

I am curious that why truncated_gradient is removed.

please reopen. this is still an issue. i missed the stale tag.

Has any progress been made on how to do the truncate_gradients thing in keras? I do not get how to do it with stateful mode, and it seems there should be some simple option like theano has (truncate_gradient in the scan function). Would appreciate any help on this!

looking forward to the solution too

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anjishnu picture anjishnu  路  3Comments

oweingrod picture oweingrod  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

somewacko picture somewacko  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments