Pytorch: LSTM forget gate bias initialization

Created on 15 Feb 2017 · 3Comments · Source: pytorch/pytorch

Some papers suggest to set forget gate bias of LSTMs to a specific value. For example:
http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf

Is it possible to do using current implementation of LSTM/LSTMCell?

Source

ikostrikov

Most helpful comment

Yes, the ordering of weights a biases is the same for all implementations and is ingate, forgetgate, cellgate, outgate. You need to initialize the values between 1/4 and 1/2 of the bias vector to the desired value.

apaszke on 17 Feb 2017

👍5

All 3 comments

apaszke on 17 Feb 2017

👍5

What is the difference between "bias_ih" and "bias_hh" in the LSTM and GRU cells? Should both be initialized with ones between 1/4 and 1/2?

kellywzhang on 20 Feb 2017

One of them is added to the linear transform of the input, another one to the hidden transform. It's redundant - there could be only one bias, and the model would be equivalent. However, that's what cuDNN does, so we preferred to keep it like that for consistency.

apaszke on 24 Feb 2017

Was this page helpful?

0 / 5 - 0 ratings