Stable-baselines: [Question] About Vectorized Environments

Created on 1 Aug 2019 · 4Comments · Source: hill-a/stable-baselines

When using vectorized environments, does the agent try to optimize all the n environments in relation to one another? (i.e. agent gets a positive reward for optimizing env 2 only when action 2 does not adversely effect the optimization of env 1)

My (custom) environment wants to optimize a policy for a set of actions (potential actions in different areas of my env) which are somehow dependent from one another. If feature 1 is modified by action 1 then it would affect the reward calculations for feature 2 in env 2, the changes made by "other" environments so to speak should be taken into account by all envs.

Does this implementation of VecEnvs do this or is it a policy optimized for each environment independent of what goes on in the rest?

question

Source

cli0

All 4 comments

I did not quite catch what you meant, but only single policy is optimized w.r.t all environments in the VecEnvs. Traditionally you have same environment running in parallel in many thread/processes, with all environments running independent of each other. This often speeds up gathering samples and stabilize training with better coverage of states for each update.

Miffyli on 2 Aug 2019

Ah, ok I get it now. Thank you.

cli0 on 13 Aug 2019

So if I trained my policy using SubprocVecEnv - can I then run it (evaluate) using only a single environment? Or does my agent now expect observations to be in the shape of a N-vector (where N is the number of environments I trained on)?

E.g. would this work:

```
env_vec = SubprocVecEnv([lambda: gym.make(env_name) for _ in range(6)])
model = PPO2(MlpPolicy, env_env)
model.learn()

Make single env

env_single = DummyVecEnv([lambda: gym.make(env_name) for _ in range(1)])
env_single = gym.make(env_name)
obs = env_single.reset()
model.predict(obs) . ### <<< Is this ok???