When using vectorized environments, does the agent try to optimize all the n environments in relation to one another? (i.e. agent gets a positive reward for optimizing env 2 only when action 2 does not adversely effect the optimization of env 1)
My (custom) environment wants to optimize a policy for a set of actions (potential actions in different areas of my env) which are somehow dependent from one another. If feature 1 is modified by action 1 then it would affect the reward calculations for feature 2 in env 2, the changes made by "other" environments so to speak should be taken into account by all envs.
Does this implementation of VecEnvs do this or is it a policy optimized for each environment independent of what goes on in the rest?
I did not quite catch what you meant, but only single policy is optimized w.r.t all environments in the VecEnvs. Traditionally you have same environment running in parallel in many thread/processes, with all environments running independent of each other. This often speeds up gathering samples and stabilize training with better coverage of states for each update.
Ah, ok I get it now. Thank you.
So if I trained my policy using SubprocVecEnv - can I then run it (evaluate) using only a single environment? Or does my agent now expect observations to be in the shape of a N-vector (where N is the number of environments I trained on)?
E.g. would this work:
```
env_vec = SubprocVecEnv([lambda: gym.make(env_name) for _ in range(6)])
model = PPO2(MlpPolicy, env_env)
model.learn()
env_single = DummyVecEnv([lambda: gym.make(env_name) for _ in range(1)])
env_single = gym.make(env_name)
obs = env_single.reset()
model.predict(obs) . ### <<< Is this ok???
Already answered here: https://github.com/hill-a/stable-baselines/issues/166