Stable-baselines: SAC: ValueError: setting an array element with a sequence (stable-baselines 2.10)

Created on 12 May 2020  路  17Comments  路  Source: hill-a/stable-baselines

Code:
[Custom made environment]
import gym
import numpy as np
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import SAC

model = SAC(MlpPolicy, env, verbose=1)

if train:
model.learn(total_timesteps=total_timesteps, log_interval=10)

Output:

| current_lr | 0.0003 |
| episodes | 10 |
| fps | 0 |
| mean 100 episode reward | -4 |
| n_updates | 0 |
| time_elapsed | 151 |

| total timesteps | 92 |

TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train_real_arm_perception_1.py", line 43, in
model.learn(total_timesteps=total_timesteps, log_interval=10)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/sac/sac.py", line 464, in learn
mb_infos_vals.append(self._train_step(step, writer, current_lr))
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/sac/sac.py", line 343, in _train_step
out = self.sess.run(self.step_ops, feed_dict)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1142, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

No tech support custom gym env more information needed question

All 17 comments

Use env checker to see if your environment works correctly.

PS: Please use and read the issue template (the env checker is mentioned there too)

I think we need to output a better error message in SB3 (see #707 and #712)
Currently, we cannot do that properly because of the Unvecwrapper...

EDIT: the mentioned issue is not the same but it is related in term of unclear message

check_env(env)

Traceback (most recent call last):
File "", line 1, in
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 214, in check_env
_check_returned_values(env, observation_space, action_space)
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 99, in _check_returned_values
_check_obs(obs, observation_space, 'reset')
File "/home/ipc/open_baselines/open_base/lib/python3.6/site-packages/stable_baselines/common/env_checker.py", line 89, in _check_obs
"method does not match the given observation space".format(method_name))
AssertionError: The observation returned by the reset() method does not match the given observation space

a=env.observation_space.sample()

b=env.reset()

a.shape
(6,)
b.shape
(6,)
a
array([0.30428773, 0.42360216, 0.8966984 , 0.4622259 , 0.6768906 ,
0.5416117 ], dtype=float32)
b
array([ 0.74211503, 0.34441176, 0.33516484, 0.2 , -0.25 ,
0.1 ], dtype=float32)

As the error message says, observation from reset() differs from the one set by self.observation_space.

This is not a place for technical support, though. Please close this issue if there are no further enhancements/issues related to stable-baselines.

ok. but seems to be the same. I debug everything. even compare with CartPole.

That is because SAC does not support discrete actions, only continuous ones (see docs).

I'm using continuous actions

That is because SAC does not support discrete actions, only continuous ones (see docs). Indeed this error should be clarified in future updates.

Edit: Github derped my messages.

the action space is self.action_space = spaces.Box(low=low_action, high=high_action, dtype=np.float32)

but the complain in the state space which seems completely fine

CartPole uses discrete actions, that's why it is not working. Your example does not work because reset() function is wrong.

I am closing this issue as this is not a stable-baselines bug or enhancement suggestion, and the check for action spaces has already been noted.

I know it use discrete action. My reset function seems fine, rigth data types, etc. as I posted the results up there.

Please fill the issue template completely next time, notably by formatting your code using markdown codeblock and giving a minimal working example, e.g.:

import gym
import numpy as np

class CustomEnv(gym.Env):
    def __init__(self):
        self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(6,))
        self.action_space = gym.spaces.Box(low=-1, high=1, shape=(6,))

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        return self.observation_space.sample(), 0.0, False, {}

from stable_baselines.common.env_checker import check_env

check_env(CustomEnv())

ok. I will do that. BTW I see in the code of the check_env and print the arguments and I still dont understand what's wrong, but I suppose is my problem now:
CartPole:
print("Checking env", check_env(envg))
Box(4,) : obs [ 0.01070996 -0.04723248 -0.02073532 -0.027894 ]
: t obs

My Custom Env:
print("Checking env", check_env(env))

Box(6,) : obs [ 0.14489795 0.75911766 0.21703297 0.2 -0.25 0.1 ]
: t obs

Does works in stable-baselines 2.8.0

Was this page helpful?
0 / 5 - 0 ratings