BaseRLModel implies it's not currently possible to change the number of environments for LSTM-based policies.
"Error: the environment passed must have the same number of environments as the model was trained on." \
"This is due to the Lstm policy not being capable of changing the number of environments."
I'm really only interested in the test case - once the model has been trained using X workers, I'd like to test using a single worker.
Why is it not possible to change the number of environments? From what I can tell, rollouts are passed sequentially from each worker to the policy for updates. From my vague understanding, the policy networks shouldn't be able to tell how many workers there are.
Anyways, I have a few questions:
@araffin Is probably going to link bunch of related issues on this (I could not find them, but there are some), but I will quickly give solution to 1): Pad your observations with zeros until they match the number of envs. This padding won't change the results of the actions for that one environment, as the batch elements are handled independently. Follow this example on how to handle hidden states.
Hello,
Are there any workarounds that would allow me to test on a single environment, using weights generated from multiple environments?
As predicted by @Miffyli , the answer is here ;) :
https://github.com/hill-a/stable-baselines/issues/166#issuecomment-502350843
If not, are you aware of why there is a restriction on the env size?
I couldn't find the issue where that was discussed... but this is a deeper underlying issue due to static graph. Also, as mentioned in numerous issues (a quick search in the github issues will give you some examples), LSTM code is one of the most complex thing of stable-baselines and we did not really have the time to dive into it.
Perfect, that example in #166 looks to be exactly what I need. Thanks!
Most helpful comment
Hello,
As predicted by @Miffyli , the answer is here ;) :
https://github.com/hill-a/stable-baselines/issues/166#issuecomment-502350843
I couldn't find the issue where that was discussed... but this is a deeper underlying issue due to static graph. Also, as mentioned in numerous issues (a quick search in the github issues will give you some examples), LSTM code is one of the most complex thing of stable-baselines and we did not really have the time to dive into it.