Gym: Env should provide reward range

Created on 30 Apr 2016  路  7Comments  路  Source: openai/gym

In some algorithms like R-Max it is necessary to know the maximum reward given at any _single step_ by the environment. Just like continuous state environments provide ranges for each dimension, I suggest that they should provide reward ranges too. Note that it is not the _accumulated reward_ range, but _single step_ reward range.

enhancement

Most helpful comment

I'm more in favor of calling it reward_range, which defaults to (-np.inf, np.inf)

All 7 comments

Not a bad idea.

would it make sense for that to be exposed as env.reward_space ? to follow suit with env.action_space and env.observation_space which give shape and bound data around actions and observations

Yep, that sounds right to me.

John, what do you think of doing a 1-D box which we extend to also allow \inf and -\inf as bounds?

I'm more in favor of calling it reward_range, which defaults to (-np.inf, np.inf)

Ok yeah, that sounds better. Don't think we want to imply that it can be more than a scalar.

Do you want those just to be bounds, or could it be useful to know that your rewards will be e.g. a discrete set like {-1, 1}?

I can't think of any reason why one would care that it's a discrete set.

Note the Atari environments (and perhaps others) are still at the default (-inf, inf) range and do not declare the specific range of each ROM.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mdavis-xyz picture mdavis-xyz  路  3Comments

pdoongarwal picture pdoongarwal  路  4Comments

Gawne picture Gawne  路  4Comments

RuofanKong picture RuofanKong  路  4Comments

tornadomeet picture tornadomeet  路  4Comments