Stable-baselines: [question] [feature request] support for Dict and Tuple spaces

Created on 16 Dec 2018 · 36Comments · Source: hill-a/stable-baselines

I want to train using two images from different cameras and an array of 1d data from a sensor. I'm passing these input as my env state. Obviously I need a cnn that can take those inputs, concatenate, and train on them. My question is how to pass these input to such a custom cnn in polocies.py. Also, I tried to pass two images and apparently dummy_vec_env.py had trouble with that.
obs = env.reset() File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 57, in reset self._save_obs(env_idx, obs) File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 75, in _save_obs self.buf_obs[key][env_idx] = obs ValueError: cannot copy sequence with size 2 to array axis with dimension 80

I appreciate any thoughts or examples.

enhancement question v3

Source

AloshkaD

👍8

Most helpful comment

Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?

Why aren't they supported? I also would like to pass image + scalars an input to the policy at the current stage this is not possible. I don't know if it's more convenient to write a code for this or just add a vector of scalar at the end of the image and then separate it later.

pulver22 on 19 Dec 2018

👍10

All 36 comments

Hello,

Could you please provide a minimal code to reproduce the error?

araffin on 16 Dec 2018

@araffin So for simplicity, say I need to pass two rgb images (observations from two cameras onboard a robot) of the size (80,160,4) as states like this

`class MyCustomEnv(gym.Env):

def __init__(self):

    self.observation_space = spaces.Box(low=0, high=255, shape=(80,160,4), dtype=np.float32)

    self.state =(np.zeros((80, 160,4), dtype=np.uint8),np.zeros((80, 160,4), dtype=np.uint8))

    .
    .
    .
def step(self, action):
    .
    .
    .
    self.state = self.rgbimage_1,self.rgbimage_2 
    return self.state, reward, done, info
def reset(self):
    .
    .
    .
    self.state = self.rgbimage_1,self.rgbimage_2 
    return self.state

`
I hope this is good enough. I also suspect my definition of the observation_space might not be correct but I tried different methods to define an observation_space for two images and nothing worked. I saw that you are a contributor here and I hope you would be able to help with defining the ob_space too
for the record, I tried to build an observation space like this

`self.nested_observation_space = spaces.Dict({

    'sensors':  spaces.Dict({

        #'position': spaces.Box(low=-100, high=100, shape=(3,)),

        #'velocity': spaces.Box(low=-1, high=1, shape=(3,)),

        'front_cam': spaces.Tuple((

            spaces.Box(low=0, high=255, shape=(80, 160, 4)),

            spaces.Box(low=0, high=255, shape=(80, 160, 4))

        )),
        }) 
        })

`
but that didn't work either and returned the error

env = DummyVecEnv([lambda: env]) # The algorithms require a vectorized environment to run File "d:\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 31, in __init__ shapes[key] = box.shape
for simplicity I passed this and still got the same error
self.nested_observation_space =spaces.Tuple(( spaces.Box(low=0, high=255, shape=(80, 160, 4)), spaces.Box(low=0, high=255, shape=(80, 160, 4)) ))

I can send you the complete class if you like.
Thanks

AloshkaD on 16 Dec 2018

The problem appears to be with vectorizing the env.. I get
"d:stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 35, in __init__ self.buf_obs = {k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys} File "d:\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 35, in <dictcomp> self.buf_obs = {k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys} TypeError: 'NoneType' object is not iterable

for defining the state like this
self.observation_space =spaces.Tuple(( spaces.Box(low=0, high=255, shape=(80, 160, 4), dtype=np.uint8), spaces.Box(low=0, high=255, shape=(80, 160, 4), dtype=np.uint8) ))

AloshkaD on 17 Dec 2018

Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?

araffin on 17 Dec 2018

I could concatenate the images and then separate them when fed to the cnn. I could also pad the signal with zeros and concatenate it as a 2x2 channel. I'm worried about the scalability of this approach.

AloshkaD on 18 Dec 2018

Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?

pulver22 on 19 Dec 2018

👍10

hey,

~@pulver22 Well Tuple space could be supported with some effort (IIRC you can feed tuples into the feed_dict with a tf.concat of placeholders).~

However Dict would require quite a bit of reworking for it to be compatible with all the models, as each placeholder for each tensor would be called by name, and not by sequential order.

EDIT: if anyone can see a quick hack that could work in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/input.py without changing anything else, it would be awsome to hear from you.

EDIT2: Just tried the Tuple with tf.concat and tf.stack, and it doesn't seem to want to play nice. Which makes sense when you think about concatenating an image of 90x128 integers with 4 floating point value, the code would need to flatten the input, and make it all floating point numbers; and that would only work with MLP policies.

hill-a on 19 Dec 2018

👍1

@hill-a there seem to be a hack proposed by @Atcold here but it does not seem to generalize to all envs
https://github.com/Atcold/baselines/commit/dbc329f28c968fc261f53ebafc9f53192caf967c

AloshkaD on 20 Dec 2018

@AloshkaD That seems to be more of a alterations of the models, which is exactly what I would like to avoid doing; as it might generate more unforeseen issues and bugs when changing all the models in such a way.

I was hoping to be able to simply change the input parsing code (stable_baselines/common/input.py) (that almost all the models use).

However, if this unlikely to be possible, then a redesign of the return of the input parsing code might be a more viable solution to this problem.

hill-a on 21 Dec 2018

Agreed! thank you @hill-a and @araffin.

AloshkaD on 29 Dec 2018

Sorry, I've been away these past two weeks...
Thanks @AloshkaD for the ping, btw.

What you found _is_ a working hack.
Currently, in order to avoid headaches while pulling the latest master, I've resorted to reshaping all my observations as 1D tensors (long vectors) and concatenate them all. Later on, in my neural net, I take apart the observation and send the different parts to different encoders. See traffic_models.py.

Atcold on 3 Jan 2019

@Atcold thank you. Similarly, concatenating images on the channel access worked for me but it caused many issues with tensorboard logging. The logging expect an image that is 4 channels at most and by passing 6 channels it fails. Even if I initialize the empty tensor to the right shape, the incoming images have 6 channels. I'm going to dedicate more time to fix this issue in the weekend.

AloshkaD on 3 Jan 2019

@AloshkaD, you can always reshape your data before logging it.
From the TensorFlow documentation we have that:

The summary has up to max_outputs summary values containing images. The images are built from tensor which must be 4-D with shape [batch_size, height, width, channels] and where channels can be:

1: tensor is interpreted as Grayscale.

3: tensor is interpreted as RGB.

4: tensor is interpreted as RGBA.

You can pass channel=1, and have width=6 * original_width. So, a simple reshape should be sufficient.
Please, let me know if you have any other issue.

Atcold on 16 Jan 2019

Hi ,
Is the Dict space working with stable-baselines? I am confused since the documentation doesn't mention it. It seems that this PR(https://github.com/hill-a/stable-baselines/pull/207) doesn't work. I see the code changes in utils.py but the error I am getting is in stable-baselines/common/input.py. I don't see any code that corresponds to "Dict" workspace in inputs.py.

My requirement is also similar to @AloshkaD, where I want to process multiple images and measurement vectors. I am open to try concatenating the images. Did you pad zeros to 1-D vector to concatenate with the images. Do you have reference code somewhere that I can use as a starting point?

srivatsankrishnan on 29 Mar 2019

Is the Dict space working with stable-baselines?

Hi, it is mentioned here in the doc:
"Non-array spaces such as Dict or Tuple are not currently supported by any algorithm."

The PR you are referring only adds the support for the VecEnvs, not the algorithms.

araffin on 29 Mar 2019

Thanks for your quick response and clarification. I was thinking that the feature is supported but the document is out of date. So the workaround it to basically do what @AloshkaD by concatenating the images across the channel axis.

srivatsankrishnan on 29 Mar 2019

👍1

@araffin Has anyone proposed a PR to implement Tuple/Dict/etc for the action space? I came across this in a project I'm working on- I need to specify both discrete values (which internally in the Env represent indexes into an array) and continuous (specifying specific new amounts to add to the array, to simplify a bit). I'm open to working on a PR if none is in the works.

bschreck on 14 Apr 2019

👍4

Experimented a bit with a MultiMixedProbabilityDistribution: https://github.com/hill-a/stable-baselines/compare/master...bschreck:add-multi-mixed-proba?expand=1

Not tested at all yet

bschreck on 15 Apr 2019

Hello,
for now, nobody is working on that.
However, there are two important things that needs to be taken into account when creating a PR for that feature:

it should not break previous versions
the changes should be as minimal as possible (so the code stays readable)

araffin on 15 Apr 2019

👍1

Small update on that topic, dict obs space will be supported for HER (see #273 ), when using gym.GoalEnv.
But it requires for now all keys to have the same type.

araffin on 30 Apr 2019

👀1

Apparently, there is a PR that may add support for tuples spaces in the baselines repo: https://github.com/openai/baselines/pull/914
However, I'm not sure it works when mixing images with 1D vector...

araffin on 24 May 2019

Unfortunately, I had several issues with that PR, but thanks for the ping @araffin :)

AloshkaD on 7 Jul 2019

@AloshkaD I found a solution for dictionary spaces. Use openAI gym's environment wrappers (for gym envs)
Basically, add a FlattenDictWrapper to flatten your dictionary obs space into a vector. You obs space is now of type Box
env = gym.wrappers.FlattenDictWrapper(env, dict_keys=[′observation′, ′desired_goal′])

Source
Official OpenAI blog post, see the bottom of the page

gautams3 on 12 Jul 2019

👍3

This only works for Dict that have Box env inside... (so all subenv have the same type)
We already support that for HER and more (MultiBinary and Discrete already supported) in fact: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/her/utils.py#L46

araffin on 18 Jul 2019

Thanks @gautams3. As @araffin mentioned, this may not work for my case where I have images and 1d sensor data. I'm still using the workaround in which I convert my state observations into an image with multiple channels(3 for rgb, 1 depth, and one for each sensor) and recover the signal data before feeding them to the network. I'm using PPO2

AloshkaD on 18 Jul 2019

👍1

Hello @AloshkaD,
I think your workaround is interesting. Could you please explain how you recover the signal data before feeding them to the network.
Thanks in advance.

nkleber1 on 21 Sep 2019

@nkleber1

You can use a custom policy for this. In case of CNN policy you can replace the cnn_extractor with a head of your liking where you split the augmented image into actual image and direct features (e.g. 1d sensor data). Like so:


num_direct_features = NUMBER_OF_DIRECT_FEATURES

def augmented_nature_cnn(scaled_images, **kwargs):
        """
        Copied from stable_baselines policies.py.
        This is nature CNN head where last channel of the image contains
        direct features on the last channel.

        :param scaled_images: (TensorFlow Tensor) Image input placeholder
        :param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
        :return: (TensorFlow Tensor) The CNN output layer
        """
        activ = tf.nn.relu

        # Take last channel as direct features
        other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
        # Take known amount of direct features, rest are padding zeros
        other_features = other_features[:, :num_direct_features]

        scaled_images = scaled_images[..., :-1]

        layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
        layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
        layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
        layer_3 = conv_to_fc(layer_3)

        img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))

        concat = tf.concat((img_output, other_features), axis=1)

        return concat

policy_kwargs = {
        "cnn_extractor": augmented_nature_cnn(num_features)
}

agent = PPO2(policy_kwargs=policy_kwargs, ...)

Miffyli on 21 Sep 2019

Additional remark: you should be careful regarding the automatic normalization, cf discussion https://github.com/hill-a/stable-baselines/issues/456

araffin on 21 Sep 2019

👍1

For most of the Atari games, the observation space is quite simple, you either have a Box or Discrete. The problem is that when working with real world environments or business cases, some have more complex observation spaces: single/multiple Box or a combination of Box and Discrete. Hence support for Tuple would be very nice.

The custom environment i try to implement with stable-baselines has a Tuple observation space of 4 different time series represented as 'Box', each with different shapes . After reading the comments in this section, i understood that one can merge all of them for the input and then split them apart in the custom policy. Can somebody give an example of how this might be achieved?

radusl on 4 Dec 2019

@radusl

You can append the "direct features" (non-image) features on e.g. last channel of the image, and pad it with zeros to match the other dimensions. Then you can use a cnn_extractor like one returned by this function to process the actual image with convolutions and then append it with direct features:

def create_augmented_nature_cnn(num_direct_features):
    """
    Create and return a function for augmented_nature_cnn
    used in stable-baselines.

    num_direct_features tells how many direct features there
    will be in the image.
    """

    def augmented_nature_cnn(scaled_images, **kwargs):
        """
        Copied from stable_baselines policies.py.
        This is nature CNN head where last channel of the image contains
        direct features.

        :param scaled_images: (TensorFlow Tensor) Image input placeholder
        :param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
        :return: (TensorFlow Tensor) The CNN output layer
        """
        activ = tf.nn.relu

        # Take last channel as direct features
        other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
        # Take known amount of direct features, rest are padding zeros
        other_features = other_features[:, :num_direct_features]

        scaled_images = scaled_images[..., :-1]

        layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
        layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
        layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
        layer_3 = conv_to_fc(layer_3)

        # Append direct features to the final output of extractor
        img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))

        concat = tf.concat((img_output, other_features), axis=1)

        return concat

    return augmented_nature_cnn

Miffyli on 4 Dec 2019

👍3

I am very interested in getting mixed dictionary input spaces officially supported in stable-baselines and would be willing to pay for someone to do the work since I doubt I have the skills to do it myself. If anyone here has the skills or knows of a pay-for service where I might post the project, please let me know.

pirobot on 10 Dec 2019

Is there any update on this?
I am trying to use a mixed dictionary space (Discrete + MultiDiscrete) as action space but rllib yields:
NotImplementedError: Dict action spaces are not supported, consider using gym.spaces.Tuple instead

nicofirst1 on 15 Feb 2020

@nicofirst1
No updates yet. We are focusing on transitioning on the new backend first (v3.0), after which this will be one of the high-priority updates for v3.1.

Miffyli on 15 Feb 2020

Any clue on how long will it take?

nicofirst1 on 15 Feb 2020

I can not give any exact times but at least a month, I would say.

Regarding your rllibs problem: You could modify your space to be a Tuple, no? Just make sure you provide observations in same order on each step. Please do not further this discussion here, but just food for thought.

Miffyli on 15 Feb 2020

👍1

Regarding your problem, it seems to me that Discrete is a sub ensemble of MultiDiscrete, so you could use only MultiDiscrete space in your case.
Btw, we plan support for observation Dict first, action space Dict is an open question of research.

araffin on 15 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings