Stable-baselines: [proposal] Public `VecNormalize._normalize_observation`

Created on 6 Dec 2019  路  5Comments  路  Source: hill-a/stable-baselines

I have saved demonstrations from a particular environment, and also a PPO2 policy trained on a normalized version of the same environment.

To properly compare the unnormalized observations in the demonstration against the PPO2 policy I am using VecNormalize._normalize_observation(obs_demo). For ease of use, I'm wondering if the maintainers would be fine with:

  1. Rename _normalize_obs to normalize_obs so that it is a "public" method.
  2. Add training param to normalize_obs() so that we can call the normalize_obs without changing normalizing statistics. VecNormalize.step() would call self.normalize_obs(obs, training=self.training).
enhancement

All 5 comments

Hello,

I'm totally for that feature, in fact it would good to prepare for https://github.com/hill-a/stable-baselines/issues/200

I would rather decouple the update from the normalization to avoid confusion (so no training parameter is needed)

While we're at it, how about exposing the reward normalization, which currently is embedded in step_wait? And having a get_original_rew similar to get_original_obs (I think this is also needed for https://github.com/hill-a/stable-baselines/issues/200)

Combining this and @araffin's proposals, the interface would be:

  • normalize_observation and normalize_reward, both of which are side-effect free.
  • step_wait has the self.training check and updates ret_rms and obs_rms where applicable. It then calls the above two methods. It should also store self.old_rew = rews.
  • reset also has the self.training check and updates obs_rms where applicable.
  • add get_original_rew which just returns self.old_rew.

Lines 84-87 in reset() also look sketchy to me, since there is no similar logic in step_wait, and I think can just be replaced with self.old_obs = obs.

Good point, i think this is also a good moment to rename rew to reward

Good point, i think this is also a good moment to rename rew to reward

@araffin Did you mean that you want to rename self.get_original_{rew=>rewards}? (If so I will also self.get_original_{obs=>observations}

Good point, i think this is also a good moment to rename rew to reward

@araffin Did you mean that you want to rename self.get_original_{rew=>rewards}? (If so I will also self.get_original_{obs=>observations}

only the first one (the 2nd one would break some code and is maybe too long), and it would be singular i think (but it does not really matter)

Was this page helpful?
0 / 5 - 0 ratings