Keras: reintroducing truncated gradient BPTT

Created on 25 Oct 2016 · 5Comments · Source: keras-team/keras

truncated_gradient was removed from keras. it is supported in Theano.
when you have very long sequences and you want a long memory, truncating the gradient in BPTT can be essential to performance.

@fchollet expressed in several places that this can be done using the stateful rnn. however after working on it for the last week, i find this quite non-trivial. splitting the data to short sequences introduces many edge cases, and with sequences of varying length you have to implement the fit and evaluate loops yourself to control the state reset (see also https://github.com/fchollet/keras/issues/4185 )

stale

Source

eyaler

👍6

Most helpful comment

I am curious that why truncated_gradient is removed.

xuewei4d on 15 Nov 2016

👍3

All 5 comments

How do you propose getting it into Keras' code base? I'm very curious! Totally agree that stateful RNNs require a lot of fiddling.

carlthome on 26 Oct 2016

I am curious that why truncated_gradient is removed.

xuewei4d on 15 Nov 2016

👍3

please reopen. this is still an issue. i missed the stale tag.

eyaler on 23 Jun 2017

Has any progress been made on how to do the truncate_gradients thing in keras? I do not get how to do it with stateful mode, and it seems there should be some simple option like theano has (truncate_gradient in the scan function). Would appreciate any help on this!