Some papers suggest to set forget gate bias of LSTMs to a specific value. For example:
http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf
Is it possible to do using current implementation of LSTM/LSTMCell?
Yes, the ordering of weights a biases is the same for all implementations and is ingate, forgetgate, cellgate, outgate
. You need to initialize the values between 1/4 and 1/2 of the bias vector to the desired value.
What is the difference between "bias_ih" and "bias_hh" in the LSTM and GRU cells? Should both be initialized with ones between 1/4 and 1/2?
One of them is added to the linear transform of the input, another one to the hidden transform. It's redundant - there could be only one bias, and the model would be equivalent. However, that's what cuDNN does, so we preferred to keep it like that for consistency.
Most helpful comment
Yes, the ordering of weights a biases is the same for all implementations and is
ingate, forgetgate, cellgate, outgate
. You need to initialize the values between 1/4 and 1/2 of the bias vector to the desired value.