stable-baselines 🚀 - Converting OpenAI vectorized env to stable baselines vectorized env?

Hello,
I think you should use the gym.Env (using gym.make, cf README) instead of the ProcgenEnv.
And why do you want to convert a VecNormalize to a DummyVecEnv? (VecNormlize is already a VecEnv...)

araffin on 28 Jan 2020

I am trying to convert it because when I try to train a PPO agent directly on the venv I get the following error:
```
Wrapping the env in a DummyVecEnv.

ValueError Traceback (most recent call last)
in
5
6 model = PPO2(MlpPolicy, venv, verbose=1)
----> 7 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
317 self._setup_learn()
318
--> 319 runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
320 self.episode_reward = np.zeros((self.n_envs,))
321

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
447 :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
448 """
--> 449 super().__init__(env=env, model=model, n_steps=n_steps)
450 self.lam = lam
451 self.gamma = gamma

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
17 self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
18 self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19 self.obs[:] = env.reset()
20 self.n_steps = n_steps
21 self.states = model.initial_state

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
51 for env_idx in range(self.num_envs):
52 obs = self.envs[env_idx].reset()
---> 53 self._save_obs(env_idx, obs)
54 return self._obs_from_buf()
55

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
70 for key in self.keys:
71 if key is None:
---> 72 self.buf_obs[key][env_idx] = obs
73 else:
74 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)```

GNiendorf on 28 Jan 2020

Two things:

The environment you are trying to learn uses images, so you need CnnPolicy, not MlpPolicy
You (probably) do not have to wrap your environment into new VecEnvs as, like arrafin mentioned, the ProcGen environment is already vectorized and you give it to the learn method as it is.

There could be a chance the VecEnv coming out from ProcGen does not work in stable-baselines as it is, which probably should be fixed with a bug (being such a nice environment).

Miffyli on 28 Jan 2020

I get the same error with CnnPolicy:

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)

Also, calling gym.make on a procgen environment actually calls ProcgenEnv in gym.

GNiendorf on 28 Jan 2020

Here I forced an error with gym.make by giving it a bad keyword and it shows that it calls ProcgenEnv at the end:
```
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(id, *kwargs)
154
155 def make(id, *kwargs):
--> 156 return registry.make(id, **kwargs)
157
158 def spec(id):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, path, kwargs)
99 logger.info('Making new env: %s', path)
100 spec = self.spec(path)
--> 101 env = spec.make(kwargs)
102 # We used to have people override _reset/_step rather than
103 # reset/step. Set _gym_disable_underscore_compat = True on

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, kwargs)
71 else:
72 cls = load(self.entry_point)
---> 73 env = cls(_kwargs)
74
75 # Make the enviroment aware of which spec it came from.

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/gym_registration.py in make_env(kwargs)
16
17 def make_env(kwargs):
---> 18 venv = ProcgenEnv(num_envs=1, num_threads=0, **kwargs)
19 env = Scalarize(venv)
20 env = RemoveDictObs(env, key="rgb")

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/env.py in __init__(self, num_envs, env_name, center_agent, options, use_generated_assets, paint_vel_info, distribution_mode, *kwargs)
182 }
183 )
--> 184 super().__init__(num_envs, env_name, options, *kwargs) ```

GNiendorf on 28 Jan 2020

PS: as mentioned in the issue template, please use the markdown code blocks
for both code and stack traces.

EDIT: did you check the env with the env_checker ? (also mentioned in the issue template)

araffin on 28 Jan 2020

And to clarify @araffin 's comment: Did you remove the extra VecEnvs? What happens if you do just

venv = ProcgenEnv(num_envs=200, env_name="coinrun")
model = PPO2(CnnPolicy, venv, verbose=1)

Miffyli on 28 Jan 2020

In both cases (removing extra VecEnvs and keep them) I get the following error when using env_checker:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-38-3559b4aab3c7> in <module>
----> 1 check_env(venv)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/env_checker.py in check_env(env, warn, skip_render_check)
    179         True by default (useful for the CI)
    180     """
--> 181     assert isinstance(env, gym.Env), ("You environment must inherit from gym.Env class "
    182                                       " cf https://github.com/openai/gym/blob/master/gym/core.py")
    183 

AssertionError: You environment must inherit from gym.Env class  cf https://github.com/openai/gym/blob/master/gym/core.py

GNiendorf on 28 Jan 2020

Here if I remove the extra calls, same error:

```

Wrapping the env in a DummyVecEnv.

ValueError Traceback (most recent call last)
in
7 venv = VecExtractDictObs(venv, "rgb")
8 model = PPO2(CnnPolicy, venv, verbose=1)
----> 9 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
317 self._setup_learn()
318
--> 319 runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
320 self.episode_reward = np.zeros((self.n_envs,))
321

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
447 :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
448 """
--> 449 super().__init__(env=env, model=model, n_steps=n_steps)
450 self.lam = lam
451 self.gamma = gamma

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
17 self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
18 self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19 self.obs[:] = env.reset()
20 self.n_steps = n_steps
21 self.states = model.initial_state

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
51 for env_idx in range(self.num_envs):
52 obs = self.envs[env_idx].reset()
---> 53 self._save_obs(env_idx, obs)
54 return self._obs_from_buf()
55

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
70 for key in self.keys:
71 if key is None:
---> 72 self.buf_obs[key][env_idx] = obs
73 else:
74 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3) ```

GNiendorf on 28 Jan 2020

If I remove that dictobs call though I get this error:

```

Wrapping the env in a DummyVecEnv.

NotImplementedError Traceback (most recent call last)
in
5
6 venv = ProcgenEnv(num_envs=200, env_name="coinrun")
----> 7 model = PPO2(CnnPolicy, venv, verbose=1)
8 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, policy, env, gamma, n_steps, ent_coef, learning_rate, vf_coef, max_grad_norm, lam, nminibatches, noptepochs, cliprange, cliprange_vf, verbose, tensorboard_log, _init_setup_model, policy_kwargs, full_tensorboard_log, seed, n_cpu_tf_sess)
102
103 if _init_setup_model:
--> 104 self.setup_model()
105
106 def _get_pretrain_placeholders(self):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in setup_model(self)
132
133 act_model = self.policy(self.sess, self.observation_space, self.action_space, self.n_envs, 1,
--> 134 n_batch_step, reuse=False, **self.policy_kwargs)
135 with tf.variable_scope("train_model", reuse=True,
136 custom_getter=tf_util.outer_scope_getter("train_model")):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, *_kwargs)
599 def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, *_kwargs):
600 super(CnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
--> 601 feature_extraction="cnn", **_kwargs)
602
603

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, layers, net_arch, act_fun, cnn_extractor, feature_extraction, *kwargs)
538 act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", *kwargs):
539 super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 540 scale=(feature_extraction == "cnn"))
541
542 self._kwargs_check(feature_extraction, kwargs)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale)
219 def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, scale=False):
220 super(ActorCriticPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 221 scale=scale)
222 self._pdtype = make_proba_dist_type(ac_space)
223 self._policy = None

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale, obs_phs, add_action_ph)
115 with tf.variable_scope("input", reuse=False):
116 if obs_phs is None:
--> 117 self._obs_ph, self._processed_obs = observation_input(ob_space, n_batch, scale=scale)
118 else:
119 self._obs_ph, self._processed_obs = obs_phs

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/input.py in observation_input(ob_space, batch_size, name, scale)
49 else:
50 raise NotImplementedError("Error: the model does not support input space of type {}".format(
---> 51 type(ob_space).__name__))

NotImplementedError: Error: the model does not support input space of type Dict ```

GNiendorf on 28 Jan 2020

If I just use gym.make as suggested it works. The problem is that I can't specify the number of environments I want to train on (say 200 like before) since in the gym code it calls ProcgenEnv with num_envs=1.

env = gym.make("procgen:procgen-maze-v0", distribution_mode='easy')
model = PPO2(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=10000)

GNiendorf on 28 Jan 2020

Ah right, because by default the environment works on Dicts. Am I to assume correct that this one did not work either?

venv = ProcgenEnv(num_envs=200, env_name="coinrun")
venv = VecExtractDictObs(venv, "rgb")
model = PPO2(CnnPolicy, venv, verbose=1)

If that does not work you can still create the environments manually. You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.

Miffyli on 28 Jan 2020

That's correct, I get the same error as before (when including those extra venv calls)

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)

GNiendorf on 28 Jan 2020

I see. I will attempt to create the wrapper. Thank you for the suggestion.

GNiendorf on 28 Jan 2020

Thanks for trying this out! Sounds like a bug-ish thing we should fix, or at least offer tools to avoid redoing work of that environment.

Miffyli on 28 Jan 2020

You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.

this already exist: https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py

araffin on 28 Jan 2020

👍2

Stable-baselines: Converting OpenAI vectorized env to stable baselines vectorized env?

Most helpful comment

All 16 comments

Wrapping the env in a DummyVecEnv.

Wrapping the env in a DummyVecEnv.

Related issues