Keras: LSTM dropout_W vs Dropout

Created on 8 Feb 2017  路  5Comments  路  Source: keras-team/keras

Is there any difference between using a Dropout layer before an LSTM and using the argument dropout_W of that LSTM layer?

Like this:
model.add(Dropout(0.5)) model.add(LSTM(100))
model.add(LSTM(100, dropout_W=0.5)

Also, in Zaremba et al paper of RNN regularization, they say dropout should never be applied to recurrent connections (dropout_U = 0). Why do I see many examples with dropout_U > 0? Am I missing something?

Thank you.

stale

Most helpful comment

To anyone coming to this issue thread more recently:

dropout_U has now been renamed to recurrent_dropout, as seen in legacy_recurrent_support.

All 5 comments

Hello, I don't know, but by looking at the source, they reference the paper for the dropout justification : - A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

I guess there are some advanced usage like "consume_less" mode which might provide an explanation for including the dropout, inside the LSTM rather than before and after. (Like grouping both matrix W and U into one for performance, but you then need to group dropoutU and dropoutW)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

In the paper cited by @unrealwill the authors provide a bayesian viewpoint on dropout in RNNs (as compared to the ensemble method perspective). Following their conclusions, they propose a theoretically motivated dropout method for RNNs and their variants.

Unlike previous techniques with varying dropout masks for different time steps and no dropout in the recurrent connections (like the Zaremba paper), the same network units are dropped at each time step.
The authours experimental results lead to the suggestion that this dropout method is superior to prior approaches.

This makes dropout_U a valid choice.

To anyone coming to this issue thread more recently:

dropout_U has now been renamed to recurrent_dropout, as seen in legacy_recurrent_support.

@davidlenz has there been significantly more papers supporting better models when dropouts are in the recurrent layers, than dropping before/or after LSTM's?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anjishnu picture anjishnu  路  3Comments

nryant picture nryant  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

kylemcdonald picture kylemcdonald  路  3Comments