I'm trying to create a custom policy network, but I can't find the default architecture for an MlpLnLstmPolicy to benchmark against. The only thing I see is that MlpLnLstm has a shared LSTM network of default size 256, but I don't know the sizes and number of layers within the Value and Policy networks.
It would also be good to know if there are any activations or dropouts between layers (if applicable).
Thank you
Indeed it is not apparent directly what is default architecture, but you can find the essentials in LstmPolicy's init:
Two layers of 64 units with tanh-activations, followed by the LSTM layer of 256 units. This is then split into value and policy functions. If you use CNN version, it use the network from Nature DQN paper (code here) with ReLU activations.
which parameters determine the number of recurrent steps?
@Hoiy See #759 (answer: n_steps)
I will close this issue as the original question was answered.
Most helpful comment
Indeed it is not apparent directly what is default architecture, but you can find the essentials in LstmPolicy's init:
Two layers of 64 units with tanh-activations, followed by the LSTM layer of 256 units. This is then split into value and policy functions. If you use CNN version, it use the network from Nature DQN paper (code here) with ReLU activations.