Code:
[Custom made environment]
import gym
import numpy as np
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import SAC
model = SAC(MlpPolicy, env, verbose=1)
if train:
model.learn(total_timesteps=total_timesteps, log_interval=10)
| current_lr | 0.0003 |
| episodes | 10 |
| fps | 0 |
| mean 100 episode reward | -4 |
| n_updates | 0 |
| time_elapsed | 151 |
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train_real_arm_perception_1.py", line 43, in
model.learn(total_timesteps=total_timesteps, log_interval=10)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/sac/sac.py", line 464, in learn
mb_infos_vals.append(self._train_step(step, writer, current_lr))
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/sac/sac.py", line 343, in _train_step
out = self.sess.run(self.step_ops, feed_dict)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1142, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
Use env checker to see if your environment works correctly.
PS: Please use and read the issue template (the env checker is mentioned there too)
I think we need to output a better error message in SB3 (see #707 and #712)
Currently, we cannot do that properly because of the Unvecwrapper...
EDIT: the mentioned issue is not the same but it is related in term of unclear message
check_env(env)
Traceback (most recent call last):
File "
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 214, in check_env
_check_returned_values(env, observation_space, action_space)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 99, in _check_returned_values
_check_obs(obs, observation_space, 'reset')
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 89, in _check_obs
"method does not match the given observation space".format(method_name))
AssertionError: The observation returned by the reset() method does not match the given observation space
a=env.observation_space.sample()
b=env.reset()
a.shape
(6,)
b.shape
(6,)
a
array([0.30428773, 0.42360216, 0.8966984 , 0.4622259 , 0.6768906 ,
0.5416117 ], dtype=float32)
b
array([ 0.74211503, 0.34441176, 0.33516484, 0.2 , -0.25 ,
0.1 ], dtype=float32)
As the error message says, observation from reset() differs from the one set by self.observation_space.
This is not a place for technical support, though. Please close this issue if there are no further enhancements/issues related to stable-baselines.
ok. but seems to be the same. I debug everything. even compare with CartPole.
That is because SAC does not support discrete actions, only continuous ones (see docs).
I'm using continuous actions
That is because SAC does not support discrete actions, only continuous ones (see docs). Indeed this error should be clarified in future updates.
Edit: Github derped my messages.
the action space is self.action_space = spaces.Box(low=low_action, high=high_action, dtype=np.float32)
but the complain in the state space which seems completely fine
CartPole uses discrete actions, that's why it is not working. Your example does not work because reset() function is wrong.
I am closing this issue as this is not a stable-baselines bug or enhancement suggestion, and the check for action spaces has already been noted.
I know it use discrete action. My reset function seems fine, rigth data types, etc. as I posted the results up there.
Please fill the issue template completely next time, notably by formatting your code using markdown codeblock and giving a minimal working example, e.g.:
import gym
import numpy as np
class CustomEnv(gym.Env):
def __init__(self):
self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(6,))
self.action_space = gym.spaces.Box(low=-1, high=1, shape=(6,))
def reset(self):
return self.observation_space.sample()
def step(self, action):
return self.observation_space.sample(), 0.0, False, {}
from stable_baselines.common.env_checker import check_env
check_env(CustomEnv())
ok. I will do that. BTW I see in the code of the check_env and print the arguments and I still dont understand what's wrong, but I suppose is my problem now:
CartPole:
print("Checking env", check_env(envg))
Box(4,) : obs [ 0.01070996 -0.04723248 -0.02073532 -0.027894 ]
My Custom Env:
print("Checking env", check_env(env))
Box(6,) : obs [ 0.14489795 0.75911766 0.21703297 0.2 -0.25 0.1 ]
Does works in stable-baselines 2.8.0