Pytorch: LSTM forget gate bias initialization

Created on 15 Feb 2017  路  3Comments  路  Source: pytorch/pytorch

Some papers suggest to set forget gate bias of LSTMs to a specific value. For example:
http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf

Is it possible to do using current implementation of LSTM/LSTMCell?

Most helpful comment

Yes, the ordering of weights a biases is the same for all implementations and is ingate, forgetgate, cellgate, outgate. You need to initialize the values between 1/4 and 1/2 of the bias vector to the desired value.

All 3 comments

Yes, the ordering of weights a biases is the same for all implementations and is ingate, forgetgate, cellgate, outgate. You need to initialize the values between 1/4 and 1/2 of the bias vector to the desired value.

What is the difference between "bias_ih" and "bias_hh" in the LSTM and GRU cells? Should both be initialized with ones between 1/4 and 1/2?

One of them is added to the linear transform of the input, another one to the hidden transform. It's redundant - there could be only one bias, and the model would be equivalent. However, that's what cuDNN does, so we preferred to keep it like that for consistency.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

szagoruyko picture szagoruyko  路  3Comments

dablyo picture dablyo  路  3Comments

Coderx7 picture Coderx7  路  3Comments

NgPDat picture NgPDat  路  3Comments

soumith picture soumith  路  3Comments