When I set env.seed(0) (or some other seed) I expected all random elements of env to produce deterministically. However, the env.action_space.sample() function still seems to output randomly.
a1 = []
a2 = []
env1 = gym.make('FrozenLake-v0')
env1.seed(0)
s1 = env1.reset()
for _ in range(4):
a1.append(env1.action_space.sample())
env2 = gym.make('FrozenLake-v0')
env2.seed(0)
s2 = env2.reset()
for _ in range(4):
a2.append(env2.action_space.sample())
print a1
print a2
produces different results for a1 and a2. For example:
[1, 0, 2, 2]
[0, 3, 2, 1]
Perhaps this was/is desired, but as mentioned above, I thought that setting env.seed() would override that.
see in gym source code how do spaces sample; e.g. https://github.com/openai/gym/blob/339415aa03a9b039a51f67798a44f8cd21464091/gym/spaces/box.py#L28-L29 they use separate random number generator that lives in gym.spaces.prng. If you want action / observation space to sample deterministically you will need to
from gym.spaces.prng import seed
seed(123)
OK, thanks for that info.
I was questioning if that should be the case, given a seemingly "overarching" nature of a simple line like env.seed(). BUT, if that is the way they want it to be done (or perhaps how it has to be done), I'm fine with that.
For newer versions use env.action_space.np_random.seed(123) - depending on the specific environment you might need env.seed(123) for a deterministic behavior.
Most helpful comment
For newer versions use
env.action_space.np_random.seed(123)- depending on the specific environment you might needenv.seed(123)for a deterministic behavior.