Stable-baselines: [question] Why are RL CNNs so shallow?

Created on 11 Jun 2019 · 2Comments · Source: hill-a/stable-baselines

It seems that RL CNNs are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

question

Source

AlanKuurstra

👍2

Most helpful comment

Hello,

are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

That's a good question, and you are right in most cases.
I think a simple answer would be that they are complex enough to solve the tasks.

To my knowledge, the most complex (and successful) CNN Policy architecture is the one from IMPALA, where some residual connections are used.
The way RL works makes it also tricky to use with batch-norm, which usually allow the use of deeper net.

Then, a lot of RL problems do not use images as input (e.g. Mujoco/Pybullet envs, where the input is the joints angles), in that case there is no need to have more complex architecture.

Finally, you can always try to use deeper net, but by experience, this does not often result in better perfomances.

araffin on 11 Jun 2019

👍3

All 2 comments

Hello,

are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

That's a good question, and you are right in most cases.
I think a simple answer would be that they are complex enough to solve the tasks.

Then, a lot of RL problems do not use images as input (e.g. Mujoco/Pybullet envs, where the input is the joints angles), in that case there is no need to have more complex architecture.

Finally, you can always try to use deeper net, but by experience, this does not often result in better perfomances.

araffin on 11 Jun 2019

👍3

Hello,

as i see it:
In image recognition the algorithm needs to recognize the image label. This is done by projecting the image to some latent space where the pictures are separable. In RL the image just represent the state
which is why only few features of the pictures is needed. your confusion comes from your view on how humans make choices which is not the same as RL.(look on this video)