Hi, I'm trying to train and evaluate a A2C model using 4 parallel environments for the training and just 1 environment for the evaluation. The code I'm using is the following:
# Initialize parallel environments (training)
train_env = make_vec_env(MyTrainEnv, env_kwargs={...}, n_envs=4, vec_env_cls=DummyVecEnv)
test_env = make_vec_env(MyTestEnv, env_kwargs={...}, n_envs=1, vec_env_cls=DummyVecEnv)
# Create callback for evaluating model during training process
eval_callback = EvalCallback(test_env,
log_path='./logs/test/',
eval_freq=1000,
deterministic=True,
render=False,
n_eval_episodes=10)
model = A2C(MlpLstmPolicy, train_env, tensorboard_log='./logs/train/')
# Start training
model.learn(total_timesteps=1000000, callback=eval_callback)
The training procede good until the eval_callback is triggered, at that moment the following error occurs:
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\a2c\a2c.py", line 263, in learn
rollout = self.runner.run(callback)
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\common\runners.py", line 48, in run
return self._run()
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\a2c\a2c.py", line 368, in _run
if self.callback.on_step() is False:
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\common\callbacks.py", line 99, in on_step
return self._on_step()
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\common\callbacks.py", line 305, in _on_step
return_episode_rewards=True)
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\common\evaluation.py", line 54, in evaluate_policy
action, state = model.predict(obs, state=state, deterministic=deterministic)
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\common\base_class.py", line 819, in predict
actions, _, states, _ = self.step(observation, state, mask, deterministic=deterministic)
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\stable_baselines\common\policies.py", line 505, in step
{self.obs_ph: obs, self.states_ph: state, self.dones_ph: mask})
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\Users\pesap\miniconda3\envs\DeepRL-HedgeFund\lib\site-packages\tensorflow_core\python\client\session.py", line 1156, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 15) for Tensor 'input/Ob:0', which has shape '(4, 15)'
In particular the last error message is:
Cannot feed value of shape (1, 15) for Tensor 'input/Ob:0', which has shape '(4, 15)'
It seems that the model still require to receive inputs from 4 parallel environments during the evaluation process (it's not possible to use EvalCallback with n_envs > 1).
Any suggestion? Thanks in advance!
Bests,
Nicola
See documentation:
One current limitation of recurrent policies is that you must test them with the same number of environments they have been trained on.
See this comment for a possible solution: https://github.com/hill-a/stable-baselines/issues/166#issuecomment-502350843
Edit: See post below. The easiest solution likely is to update stable-baselines with `pip install --upgrade git+https://github.com/hill-a/stable-baselines``.
Related PR and issue: https://github.com/hill-a/stable-baselines/pull/1017 and https://github.com/hill-a/stable-baselines/issues/1015
Thank you very much @Miffyli and @araffin !
I upgraded stable-baselines to the latest version and now it seems to work like a charm, great work!