Stable-baselines: [question] Changing the number of envs for LSTM policies

Created on 28 Feb 2020 · 3Comments · Source: hill-a/stable-baselines

BaseRLModel implies it's not currently possible to change the number of environments for LSTM-based policies.

"Error: the environment passed must have the same number of environments as the model was trained on." \
"This is due to the Lstm policy not being capable of changing the number of environments."

I'm really only interested in the test case - once the model has been trained using X workers, I'd like to test using a single worker.

Why is it not possible to change the number of environments? From what I can tell, rollouts are passed sequentially from each worker to the policy for updates. From my vague understanding, the policy networks shouldn't be able to tell how many workers there are.

Anyways, I have a few questions:

Are there any workarounds that would allow me to test on a single environment, using weights generated from multiple environments?
If not, are you aware of _why_ there is a restriction on the env size? Is this something I can easily fix, or is there some deeper underlying issue?

duplicate question

Source

smorad

Most helpful comment

Hello,

Are there any workarounds that would allow me to test on a single environment, using weights generated from multiple environments?

As predicted by @Miffyli , the answer is here ;) :
https://github.com/hill-a/stable-baselines/issues/166#issuecomment-502350843

If not, are you aware of why there is a restriction on the env size?

I couldn't find the issue where that was discussed... but this is a deeper underlying issue due to static graph. Also, as mentioned in numerous issues (a quick search in the github issues will give you some examples), LSTM code is one of the most complex thing of stable-baselines and we did not really have the time to dive into it.

araffin on 28 Feb 2020

👍2

All 3 comments

@araffin Is probably going to link bunch of related issues on this (I could not find them, but there are some), but I will quickly give solution to 1): Pad your observations with zeros until they match the number of envs. This padding won't change the results of the actions for that one environment, as the batch elements are handled independently. Follow this example on how to handle hidden states.