How exactly does the neural net look like when using PPO? The documentation is extremely vague and I have no idea what it's actually doing.
1) If I use LSTM, where exactly is the layer located? Right after the input, directly before the output, somewhere in between? The documentation just says 'when activated uses recurrent network'. That could mean a lot of different things. Why is there no information how the actual network looks like when using LSTM?
2) If I use visual observations, what does the network look like if I choose 2 hidden layers? Input -> Convolutional Layer -> Fully-connected layer -> Output? Can I add additional conv. layers or additional fully-connected ones?
3) Is there an easy way to customize any of this? I feel like there's a lot of missing information here. I've learned to construct my networks layer-by-layer, and sometimes I just want single LSTM cells instead of entire layers. Or Max-Pooling or something. I'm trying to dive into the python side of things, but I have no clue where to start.
Hi @DVonk, you're right, we currently don't have much documentation on how the model looks when you build it - perhaps we'll add this in the future.
Generally, the models are laid out like this: Input -> Feature Extractor -> Hidden Layers -> Policy/Value Heads. The Feature Extractor can either be a FC layer or a CNN depending on if you're using visual observations. The Hidden Layers are the configurable part. The Policy/Value heads output the policy and the value estimate, respectively, and are a single FC layer from the prior part. An LSTM would be inserted in between the Hidden Layers and the Policy/Value Heads.
You can check out the models.py file for where to modify the network structure. The code was designed to be more feature-rich than modular, but you should be able to see where the TF network is created for discrete control (dc), continuous control (cc), vector observations, and visual observations.
@DVonk - i've added to our backlog to improve the documentation about the network structure
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Hi @DVonk, you're right, we currently don't have much documentation on how the model looks when you build it - perhaps we'll add this in the future.
Generally, the models are laid out like this: Input -> Feature Extractor -> Hidden Layers -> Policy/Value Heads. The Feature Extractor can either be a FC layer or a CNN depending on if you're using visual observations. The Hidden Layers are the configurable part. The Policy/Value heads output the policy and the value estimate, respectively, and are a single FC layer from the prior part. An LSTM would be inserted in between the Hidden Layers and the Policy/Value Heads.
You can check out the
models.pyfile for where to modify the network structure. The code was designed to be more feature-rich than modular, but you should be able to see where the TF network is created for discrete control (dc), continuous control (cc), vector observations, and visual observations.