Stable-baselines: [Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC

Created on 28 Oct 2019  路  5Comments  路  Source: hill-a/stable-baselines

As you see in the code of DQN policy below
https://github.com/hill-a/stable-baselines/blob/22c0753a2b792d08382f3a8fc885690e381bb471/stable_baselines/deepq/policies.py#L103-L117
layers after extracted_features does not appended for CNN features.
I can understand this as forcing to create full network in the custom cnn_extractor.
But some other policy implementation like DDPG, SAC, you are appending layers in both CNN and MLP cases. You can see this of SAC policy in the code below .
https://github.com/hill-a/stable-baselines/blob/22c0753a2b792d08382f3a8fc885690e381bb471/stable_baselines/sac/policies.py#L205-L216
Is this inconsistency of using 'layers' argument is intended?
Should we be careful about this argument for case-by-case?
Please shed some light on me.
Thanks.

documentation question

All 5 comments

You are right: The SAC and DDPG codes run inputs through cnn_extractor and then through some additional layers, while the shared policy code here does not use additional layers after cnn_extractor. Indeed this should be either very clearly documented or preferably fixed so that behavior is same all around. A PR would be very welcome :)

@araffin Was there any special reason for doing it this for DDPG/SAC, in case there is a hidden "gotcha" somewhere?

Was there any special reason for doing it this for DDPG/SAC

I don't think so. I created SAC/TD3 policy mimicking DDPG behavior, but never really used the code as normally you don't use image with DDPG/SAC/TD3. So yes, I think we should document or fix that anyway.

while the shared policy code here does not use additional layers

For the flexible mlp, it is already documented that is works only for vector observations (and not images).

Thanks for the quick answers!
Actually, I couldn't find the comment 'net_arch works only for vector observations' in the doc, but the doc leaves a clue of the association with mlp extraction like 'net_arch: blah blah... (see
mlp_extractor documentation for details)
'. But that's ok to me now.
Can I close this issue?

You can leave it open for now until the documentation is fixed. If you feel like, you can suggest a PR to fix the documentation to address this issue :)

@araffin @Miffyli
Recently, I am training some DQN with stable baselines, and I figured out that this problem is not as simple as just fixing document. I think that it would be better to change the code to allow appending flexible mlp afterward the output of cnn extractor.
This is because of the consistency with not only the implementation of other algorithms but also the advantage-stream and value-stream of (default enabled) DQN dueling function.
With current implementation, we are injecting [64, 64] (by default) layers to only value-stream V(s) of dueling network. But it seems that more complex function like A(s, a) would require the additional network also. By enabling layers after cnn extractor, we can have options from which we are allowed to inject some shared layers to cnn extractor and seperated layers to 'layers' parameters in terms of dueling network.
I'll create PR for this.

Was this page helpful?
0 / 5 - 0 ratings