Stable-baselines: [feature request] custom transformation of observation space

Created on 6 Dec 2018  路  6Comments  路  Source: hill-a/stable-baselines

Hello,

I often need to manually transform the observation space shape and associated observations in order to match custom policies I'm using. Would it be interesting to add a pre-processing mechanism that would:

def transform(obs):
    return np.reshape(obs, ...)

I guess there would be at least 2 options to expose a custom transformer: add it as a parameter to the algorithm, or register it (2nd option preferred I think).

custom gym env enhancement

Most helpful comment

@hill-a thanks. I was about to write my own :)

All 6 comments

I'm not sure I understand what you mean.

If you want to change the observation shape from the environment, you can use a custom environment wrapper that can transform your observation before it is used by the model.

If you want to change the way the batch shapes are handled, I wouldn't mind an example, as I'm not sure how this could be used.

Or do you mean something else?

I mostly agree with @hill-a, I would add one thing: what you described seems related to custom environment.
Does something prevent you from doing the transformation inside the environment ?

Well I think my use case is quite specific then :)

An example of what I meant would be to use policies with convolutional NNs but with an environment that doesn't have images as observations. This would require to transform both the observation (I agree it can be done in the environment) and the input shape (AFAIK it requires a modification of the input shape that can't be done only in the custom policy)

convolutional NNs but with an environment that doesn't have images as observations

If your observation is not a image, I would recommend you to use MLP policies, unless it is something different than a feature vector. If your observation is a tensor of dimension 3 (i.e. it is as if it is an image), then it should work.

wouldn't this work? (granted, it is not documented that this is possible in the doc examples)

import numpy as np
from gym import spaces

from stable_baselines.common.vec_env import VecEnvWrapper

class CustomVecEnvWrapper(VecEnvWrapper):
    """
    A custom vectorized environment wrapper

    :param venv: (VecEnv) the vectorized environment to wrap
    """

    def __init__(self, venv, obs_shape, action_shape):
        self.venv = venv
        self.obs_shape = obs_shape
        self.action_shape = actions_shape

        obs_low = venv.observation_space.low.reshape(obs_shape)
        obs_high = venv.observation_space.high.reshape(obs_shape)
        observation_space = spaces.Box(low=obs_low , high=obs_high , dtype=venv.observation_space.dtype)

        action_low = venv.action_space.low.reshape(action_shape)
        action_high = venv.action_space.high.reshape(action_shape)
        action_space = spaces.Box(low=action_low , high=action_high , dtype=venv.action_space.dtype)

        VecEnvWrapper.__init__(self, venv, observation_space=observation_space, action_space=action_space)

    def step_async(self, actions):
        self.venv.step_async(actions.reshape(self.actions_shape))

    def step_wait(self):
        observations, rewards, dones, infos = self.venv.step_wait()
        return observations.reshape(self.obs_shape), rewards, dones, infos

    def reset(self):
        """
        Reset all environments
        """
        obs = self.venv.reset()
        return obs .reshape(self.obs_shape)

    def close(self):
        self.venv.close()

EDIT: added action reshaping as well

@hill-a thanks. I was about to write my own :)

Was this page helpful?
0 / 5 - 0 ratings