Gym: Env should provide reward range

Created on 30 Apr 2016 · 7Comments · Source: openai/gym

In some algorithms like R-Max it is necessary to know the maximum reward given at any _single step_ by the environment. Just like continuous state environments provide ranges for each dimension, I suggest that they should provide reward ranges too. Note that it is not the _accumulated reward_ range, but _single step_ reward range.

enhancement

Source

rafaelcp

Most helpful comment

I'm more in favor of calling it reward_range, which defaults to (-np.inf, np.inf)

joschu on 1 May 2016

👍3

All 7 comments

Not a bad idea.

joschu on 30 Apr 2016

would it make sense for that to be exposed as env.reward_space ? to follow suit with env.action_space and env.observation_space which give shape and bound data around actions and observations

danlangford on 1 May 2016

Yep, that sounds right to me.

John, what do you think of doing a 1-D box which we extend to also allow \inf and -\inf as bounds?

gdb on 1 May 2016

I'm more in favor of calling it reward_range, which defaults to (-np.inf, np.inf)

joschu on 1 May 2016

👍3

Ok yeah, that sounds better. Don't think we want to imply that it can be more than a scalar.

Do you want those just to be bounds, or could it be useful to know that your rewards will be e.g. a discrete set like {-1, 1}?

gdb on 1 May 2016

I can't think of any reason why one would care that it's a discrete set.

joschu on 1 May 2016

Note the Atari environments (and perhaps others) are still at the default (-inf, inf) range and do not declare the specific range of each ROM.

shelhamer on 16 May 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

FAQ should explain how to export to video

mdavis-xyz · 3Comments

Setting is_slippery=False in FrozenLake-v0

pdoongarwal · 4Comments

Installation OS exception issue

Gawne · 4Comments

Random Seed Feature

RuofanKong · 4Comments

What's the meaning of the input of CartPole observation space?

tornadomeet · 4Comments