Gym: Create a separate repo for Wrappers ?

Created on 22 Feb 2019  路  7Comments  路  Source: openai/gym

Because there are quite some Issues/PRs for new features which could be done through implementing a Wrapper, plus there are very useful wrappers in baselines repo for RL algorithms, e.g. Atari preprocessing/DQN, continuous action clipping, normalizing observations, scaling rewards etc.

Therefore, will it be a good idea to have another repo maintained by OpenAI for keeping a wide collections of Wrappers in one place (+ unit tests), this makes gym itself lightweight and helpful for the community to access to standardized, well-tested wrappers.

What do you think @pzhokhov ?

stale

Most helpful comment

Thanks @pzhokhov , that sounds exciting ! I've made a small list here for further discussion:

  • Standard

    • [x] ClipAction: clip the received action with upper and lower bounds. Valid for Box action space
    • [x] SignReward: apply sign function to bin the reward
    • [x] ClipReward: clip the raw reward with upper and lower bounds.
    • [x] ScaleReward: rescale the reward by a factor
    • [ ] WarpFrame: downsample the image observation. Note: maybe better name with ResizeFrame/DownsampleFrame
    • [x] FlattenObservation
    • [x] GrayScaleObservation: convert pixel observation to gray scale
    • [x] TimeAwareObservation: append time step to the observation. Refer to paper, Time Limits in Reinforcement Learning
  • Standardization: better applies to VecEnv

    • [ ] StandardizeObservation: standardize the observation by online estimation of moments
    • [ ] StandardizeReward: standardize the reward by online estimation of variance. Note: NOT subtracted by mean
    • [ ] VecMonitor: record episodic length, returns and elapsed time.
  • Specific

    • [x] NoopResetEnv: atari
    • [x] FireResetEnv: atari
    • [x] EpisodicLifeEnv: atari
    • [x] MaxAndSkipEnv: atari
    • [x] FrameStack: DQN
    • [x] ScaledFloatFrame: DQN
    • [x] LazyFrames: DQN

All 7 comments

Having a collection of standardized, well-tested wrappers - sounds very useful, I agree with that. I am not sure separate repo is justified though; rather we can put them (and the unit tests) into the gym. Per discussion with @christopherhesse, env-specific standard wrappers could live in the gym, whereas wrappers that are algorithm-specific/related can live closer to algorithms, i.e. in baselines. Let's keep this issue open until we complete migration of wrappers into gym and their testing.

Thanks @pzhokhov , that sounds exciting ! I've made a small list here for further discussion:

  • Standard

    • [x] ClipAction: clip the received action with upper and lower bounds. Valid for Box action space
    • [x] SignReward: apply sign function to bin the reward
    • [x] ClipReward: clip the raw reward with upper and lower bounds.
    • [x] ScaleReward: rescale the reward by a factor
    • [ ] WarpFrame: downsample the image observation. Note: maybe better name with ResizeFrame/DownsampleFrame
    • [x] FlattenObservation
    • [x] GrayScaleObservation: convert pixel observation to gray scale
    • [x] TimeAwareObservation: append time step to the observation. Refer to paper, Time Limits in Reinforcement Learning
  • Standardization: better applies to VecEnv

    • [ ] StandardizeObservation: standardize the observation by online estimation of moments
    • [ ] StandardizeReward: standardize the reward by online estimation of variance. Note: NOT subtracted by mean
    • [ ] VecMonitor: record episodic length, returns and elapsed time.
  • Specific

    • [x] NoopResetEnv: atari
    • [x] FireResetEnv: atari
    • [x] EpisodicLifeEnv: atari
    • [x] MaxAndSkipEnv: atari
    • [x] FrameStack: DQN
    • [x] ScaledFloatFrame: DQN
    • [x] LazyFrames: DQN

@pzhokhov I'd be happy to implement some as the discussion leads to migrate that wrapper.

Hi @zuoxingdong ,
I saw you did a lot of PRs implementing the wrappers you mentioned!

My question is if there is still some wrapper left to implement, I would be happy to help you :smile:

Thank you.

Hi @AdilZouitine , thanks a lot for your interest ! I've modified a checklist above, for now, probably the standardization for observation and reward in vectorized environment are missing. Would you like to implement them ?

Hi @zuoxingdong, yes, one of them interests me, it will be an opportunity to learn more about environment vectors.

For my internship, I had to implement a wrapper similar to the WarpFrame that you propose in your list.
I can also take care of this one.

Best regards :smile:

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings