I want to train using two images from different cameras and an array of 1d data from a sensor. I'm passing these input as my env state. Obviously I need a cnn that can take those inputs, concatenate, and train on them. My question is how to pass these input to such a custom cnn in polocies.py. Also, I tried to pass two images and apparently dummy_vec_env.py had trouble with that.
obs = env.reset()
File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 57, in reset
self._save_obs(env_idx, obs)
File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 75, in _save_obs
self.buf_obs[key][env_idx] = obs
ValueError: cannot copy sequence with size 2 to array axis with dimension 80
I appreciate any thoughts or examples.
Hello,
Could you please provide a minimal code to reproduce the error?
@araffin So for simplicity, say I need to pass two rgb images (observations from two cameras onboard a robot) of the size (80,160,4) as states like this
`class MyCustomEnv(gym.Env):
def __init__(self):
self.observation_space = spaces.Box(low=0, high=255, shape=(80,160,4), dtype=np.float32)
self.state =(np.zeros((80, 160,4), dtype=np.uint8),np.zeros((80, 160,4), dtype=np.uint8))
.
.
.
def step(self, action):
.
.
.
self.state = self.rgbimage_1,self.rgbimage_2
return self.state, reward, done, info
def reset(self):
.
.
.
self.state = self.rgbimage_1,self.rgbimage_2
return self.state
`
I hope this is good enough. I also suspect my definition of the observation_space might not be correct but I tried different methods to define an observation_space for two images and nothing worked. I saw that you are a contributor here and I hope you would be able to help with defining the ob_space too
for the record, I tried to build an observation space like this
`self.nested_observation_space = spaces.Dict({
'sensors': spaces.Dict({
#'position': spaces.Box(low=-100, high=100, shape=(3,)),
#'velocity': spaces.Box(low=-1, high=1, shape=(3,)),
'front_cam': spaces.Tuple((
spaces.Box(low=0, high=255, shape=(80, 160, 4)),
spaces.Box(low=0, high=255, shape=(80, 160, 4))
)),
})
})
`
but that didn't work either and returned the error
env = DummyVecEnv([lambda: env]) # The algorithms require a vectorized environment to run
File "d:\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 31, in __init__
shapes[key] = box.shape
for simplicity I passed this and still got the same error
self.nested_observation_space =spaces.Tuple((
spaces.Box(low=0, high=255, shape=(80, 160, 4)),
spaces.Box(low=0, high=255, shape=(80, 160, 4))
))
I can send you the complete class if you like.
Thanks
The problem appears to be with vectorizing the env.. I get
"d:stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 35, in __init__
self.buf_obs = {k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys}
File "d:\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 35, in <dictcomp>
self.buf_obs = {k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys}
TypeError: 'NoneType' object is not iterable
for defining the state like this
self.observation_space =spaces.Tuple((
spaces.Box(low=0, high=255, shape=(80, 160, 4), dtype=np.uint8),
spaces.Box(low=0, high=255, shape=(80, 160, 4), dtype=np.uint8)
))
Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?
I could concatenate the images and then separate them when fed to the cnn. I could also pad the signal with zeros and concatenate it as a 2x2 channel. I'm worried about the scalability of this approach.
Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?
Why aren't they supported? I also would like to pass image + scalars an input to the policy at the current stage this is not possible. I don't know if it's more convenient to write a code for this or just add a vector of scalar at the end of the image and then separate it later.
hey,
~@pulver22 Well Tuple space could be supported with some effort (IIRC you can feed tuples into the feed_dict with a tf.concat of placeholders).~
However Dict would require quite a bit of reworking for it to be compatible with all the models, as each placeholder for each tensor would be called by name, and not by sequential order.
EDIT: if anyone can see a quick hack that could work in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/input.py without changing anything else, it would be awsome to hear from you.
EDIT2: Just tried the Tuple with tf.concat and tf.stack, and it doesn't seem to want to play nice. Which makes sense when you think about concatenating an image of 90x128 integers with 4 floating point value, the code would need to flatten the input, and make it all floating point numbers; and that would only work with MLP policies.
@hill-a there seem to be a hack proposed by @Atcold here but it does not seem to generalize to all envs
https://github.com/Atcold/baselines/commit/dbc329f28c968fc261f53ebafc9f53192caf967c
@AloshkaD That seems to be more of a alterations of the models, which is exactly what I would like to avoid doing; as it might generate more unforeseen issues and bugs when changing all the models in such a way.
I was hoping to be able to simply change the input parsing code (stable_baselines/common/input.py) (that almost all the models use).
However, if this unlikely to be possible, then a redesign of the return of the input parsing code might be a more viable solution to this problem.
Agreed! thank you @hill-a and @araffin.
Sorry, I've been away these past two weeks...
Thanks @AloshkaD for the ping, btw.
What you found _is_ a working hack.
Currently, in order to avoid headaches while pulling the latest master, I've resorted to reshaping all my observations as 1D tensors (long vectors) and concatenate them all. Later on, in my neural net, I take apart the observation and send the different parts to different encoders. See traffic_models.py.
@Atcold thank you. Similarly, concatenating images on the channel access worked for me but it caused many issues with tensorboard logging. The logging expect an image that is 4 channels at most and by passing 6 channels it fails. Even if I initialize the empty tensor to the right shape, the incoming images have 6 channels. I'm going to dedicate more time to fix this issue in the weekend.
@AloshkaD, you can always reshape your data before logging it.
From the TensorFlow documentation we have that:
The summary has up to
max_outputssummary values containing images. The images are built fromtensorwhich must be 4-D with shape[batch_size, height, width, channels]and wherechannelscan be:
- 1:
tensoris interpreted as Grayscale.- 3:
tensoris interpreted as RGB.- 4:
tensoris interpreted as RGBA.
You can pass channel=1, and have width=6 * original_width. So, a simple reshape should be sufficient.
Please, let me know if you have any other issue.
Hi ,
Is the Dict space working with stable-baselines? I am confused since the documentation doesn't mention it. It seems that this PR(https://github.com/hill-a/stable-baselines/pull/207) doesn't work. I see the code changes in utils.py but the error I am getting is in stable-baselines/common/input.py. I don't see any code that corresponds to "Dict" workspace in inputs.py.
My requirement is also similar to @AloshkaD, where I want to process multiple images and measurement vectors. I am open to try concatenating the images. Did you pad zeros to 1-D vector to concatenate with the images. Do you have reference code somewhere that I can use as a starting point?
Is the Dict space working with stable-baselines?
Hi, it is mentioned here in the doc:
"Non-array spaces such as Dict or Tuple are not currently supported by any algorithm."
The PR you are referring only adds the support for the VecEnvs, not the algorithms.
Thanks for your quick response and clarification. I was thinking that the feature is supported but the document is out of date. So the workaround it to basically do what @AloshkaD by concatenating the images across the channel axis.
@araffin Has anyone proposed a PR to implement Tuple/Dict/etc for the action space? I came across this in a project I'm working on- I need to specify both discrete values (which internally in the Env represent indexes into an array) and continuous (specifying specific new amounts to add to the array, to simplify a bit). I'm open to working on a PR if none is in the works.
Experimented a bit with a MultiMixedProbabilityDistribution: https://github.com/hill-a/stable-baselines/compare/master...bschreck:add-multi-mixed-proba?expand=1
Not tested at all yet
Hello,
for now, nobody is working on that.
However, there are two important things that needs to be taken into account when creating a PR for that feature:
Small update on that topic, dict obs space will be supported for HER (see #273 ), when using gym.GoalEnv.
But it requires for now all keys to have the same type.
Apparently, there is a PR that may add support for tuples spaces in the baselines repo: https://github.com/openai/baselines/pull/914
However, I'm not sure it works when mixing images with 1D vector...
Unfortunately, I had several issues with that PR, but thanks for the ping @araffin :)
@AloshkaD I found a solution for dictionary spaces. Use openAI gym's environment wrappers (for gym envs)
Basically, add a FlattenDictWrapper to flatten your dictionary obs space into a vector. You obs space is now of type Box
env = gym.wrappers.FlattenDictWrapper(env, dict_keys=[′observation′, ′desired_goal′])
Source
Official OpenAI blog post, see the bottom of the page
This only works for Dict that have Box env inside... (so all subenv have the same type)
We already support that for HER and more (MultiBinary and Discrete already supported) in fact: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/her/utils.py#L46
Thanks @gautams3. As @araffin mentioned, this may not work for my case where I have images and 1d sensor data. I'm still using the workaround in which I convert my state observations into an image with multiple channels(3 for rgb, 1 depth, and one for each sensor) and recover the signal data before feeding them to the network. I'm using PPO2
Hello @AloshkaD,
I think your workaround is interesting. Could you please explain how you recover the signal data before feeding them to the network.
Thanks in advance.
@nkleber1
You can use a custom policy for this. In case of CNN policy you can replace the cnn_extractor with a head of your liking where you split the augmented image into actual image and direct features (e.g. 1d sensor data). Like so:
num_direct_features = NUMBER_OF_DIRECT_FEATURES
def augmented_nature_cnn(scaled_images, **kwargs):
"""
Copied from stable_baselines policies.py.
This is nature CNN head where last channel of the image contains
direct features on the last channel.
:param scaled_images: (TensorFlow Tensor) Image input placeholder
:param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
:return: (TensorFlow Tensor) The CNN output layer
"""
activ = tf.nn.relu
# Take last channel as direct features
other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
# Take known amount of direct features, rest are padding zeros
other_features = other_features[:, :num_direct_features]
scaled_images = scaled_images[..., :-1]
layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
layer_3 = conv_to_fc(layer_3)
img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))
concat = tf.concat((img_output, other_features), axis=1)
return concat
policy_kwargs = {
"cnn_extractor": augmented_nature_cnn(num_features)
}
agent = PPO2(policy_kwargs=policy_kwargs, ...)
Additional remark: you should be careful regarding the automatic normalization, cf discussion https://github.com/hill-a/stable-baselines/issues/456
For most of the Atari games, the observation space is quite simple, you either have a Box or Discrete. The problem is that when working with real world environments or business cases, some have more complex observation spaces: single/multiple Box or a combination of Box and Discrete. Hence support for Tuple would be very nice.
The custom environment i try to implement with stable-baselines has a Tuple observation space of 4 different time series represented as 'Box', each with different shapes . After reading the comments in this section, i understood that one can merge all of them for the input and then split them apart in the custom policy. Can somebody give an example of how this might be achieved?
@radusl
You can append the "direct features" (non-image) features on e.g. last channel of the image, and pad it with zeros to match the other dimensions. Then you can use a cnn_extractor like one returned by this function to process the actual image with convolutions and then append it with direct features:
def create_augmented_nature_cnn(num_direct_features):
"""
Create and return a function for augmented_nature_cnn
used in stable-baselines.
num_direct_features tells how many direct features there
will be in the image.
"""
def augmented_nature_cnn(scaled_images, **kwargs):
"""
Copied from stable_baselines policies.py.
This is nature CNN head where last channel of the image contains
direct features.
:param scaled_images: (TensorFlow Tensor) Image input placeholder
:param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
:return: (TensorFlow Tensor) The CNN output layer
"""
activ = tf.nn.relu
# Take last channel as direct features
other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
# Take known amount of direct features, rest are padding zeros
other_features = other_features[:, :num_direct_features]
scaled_images = scaled_images[..., :-1]
layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
layer_3 = conv_to_fc(layer_3)
# Append direct features to the final output of extractor
img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))
concat = tf.concat((img_output, other_features), axis=1)
return concat
return augmented_nature_cnn
I am very interested in getting mixed dictionary input spaces officially supported in stable-baselines and would be willing to pay for someone to do the work since I doubt I have the skills to do it myself. If anyone here has the skills or knows of a pay-for service where I might post the project, please let me know.
Is there any update on this?
I am trying to use a mixed dictionary space (Discrete + MultiDiscrete) as action space but rllib yields:
NotImplementedError: Dict action spaces are not supported, consider using gym.spaces.Tuple instead
@nicofirst1
No updates yet. We are focusing on transitioning on the new backend first (v3.0), after which this will be one of the high-priority updates for v3.1.
Any clue on how long will it take?
I can not give any exact times but at least a month, I would say.
Regarding your rllibs problem: You could modify your space to be a Tuple, no? Just make sure you provide observations in same order on each step. Please do not further this discussion here, but just food for thought.
Regarding your problem, it seems to me that Discrete is a sub ensemble of MultiDiscrete, so you could use only MultiDiscrete space in your case.
Btw, we plan support for observation Dict first, action space Dict is an open question of research.
Most helpful comment
Why aren't they supported? I also would like to pass image + scalars an input to the policy at the current stage this is not possible. I don't know if it's more convenient to write a code for this or just add a vector of scalar at the end of the image and then separate it later.