Description of the bug
I have been unable to get reproducible results when using the same
seed for the random number generators.
Code example
Starting from the example described at
https://stable-baselines.readthedocs.io/en/master/modules/ppo1.html
I can create
import gym
from stable_baselines.common.policies import MlpPolicy, MlpLstmPolicy, MlpLnLstmPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO1
env = gym.make('CartPole-v1')
env = DummyVecEnv([lambda: env])
model = PPO1(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=5000,seed=100)
model.save("ppo1_cartpole")
Note that I have added seed=100 to model.learn().
Running this example prints output to the screen and writes the
ppo1_cartpole.pkl file.
Running the exact same code twice (with the same seed value) produces
different screen outputs and different ppo1_cartpole.pkl files.
System Info
My environment:
Additional context
It appears from the code that when seed is not None in learn() the
function set_global_seeds(seed) is called. I can see that this
function initialises the following random number generators with the
specified seed:
def set_global_seeds(seed):
"""
set the seed for python random, tensorflow, numpy and gym spaces
:param seed: (int) the seed
"""
tf.set_random_seed(seed)
np.random.seed(seed)
random.seed(seed)
gym.spaces.prng.seed(seed)
Because of this I also tried including the code lines
from stable_baselines.common import set_global_seeds
set_global_seeds(100)
before the call to gym.make() in the above example, but it did not help.
Hello,
This is known issue and is on the roadmap. It apparently comes from tensorflow and any help is appreciated ;)
Among the issues I found:
The only case where I have reproducible results is when I test a learned policy with deterministic=True using the predict() method.
The problem might be related to the bug reported at https://github.com/keras-team/keras/issues/2280
However, I tried the suggestion, made at https://github.com/keras-team/keras/issues/2280#issuecomment-411542012, to use
PYTHONHASHSEED=0 python
but it didn't help.
Ultimately this appears to be a Tensorflow bug, see the bug at https://github.com/tensorflow/tensorflow/issues/9171
Bug report also indicates it won't be fixed until TensorFlow 2.0.
Basically the only way to get around this is to set the seed at the operation level, i.e. when a random number is generated.
EDIT: They point to a set of stateless random number generators here I don't know if these help at all.
@crobarcro thanks for pointing out that issue, that was what I was afraid of... So we need to seed weights initialization and random sampling done in common.distributions.
@crobarcro @pstansell I tweaked a bit the code and managed to get reproducible results for A2C, ACER, PPO1, PPO2 and TRPO (not working with ACKTR yet and the others I did not try)
You can find details here on the deterministic-fix branch.
Hi everyone, I've got to ask, are there any plans to merge the deterministic fix with the master branch?
Hello,
If you look at the roadmap and the milestones, it is planned for the next releases. However, there is no due date, we would appreciate contribution to help us finish it.
we would appreciate contribution to help us finish it.
I would like to help and can normally contribute some time each day, does the project have a slack channel?
does the project have a slack channel?
We don't have a slack channel. However, we have a roadmap, milestones and issues ;)
If you start working on something, just comment on the appropriate issue.