Describe the bug
A simple run of PPO1 crashes. The assertion thetaroot == thetalocal fails, and it's not due to NaNs as the floats differ. This doesn't happen in baselines.
Code example
Minimal reproducible example:
import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines import PPO1
env = gym.make("CartPole-v1")
model = PPO1(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=10000)
System Info
Stdout + Traceback
(venv) petersen33md:runs petersen33md$ mpirun -n 2 python ppo1_test.py
********** Iteration 0 ************
...
********** Iteration 6 ************
Optimizing...
pol_surr | pol_entpen | vf_loss | kl | ent
Optimizing...
pol_surr | pol_entpen | vf_loss | kl | ent
-0.00082 | -0.00627 | 117.56442 | 8.56e-05 | 0.62709
-0.00030 | -0.00630 | 128.11664 | 7.79e-05 | 0.63015
Traceback (most recent call last):
File "ppo1_test.py", line 160, in <module>
model.learn(total_timesteps=10000, callback=callback)
File "/Users/petersen33/repositories/stable-baselines/stable_baselines/ppo1/pposgd_simple.py", line 272, in learn
self.adam.update(grad, self.optim_stepsize * cur_lrmult)
File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/mpi_adam.py", line 48, in update
self.check_synced()
File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/mpi_adam.py", line 83, in check_synced
assert (thetaroot == thetalocal).all(), (thetaroot, thetalocal)
AssertionError: (array([ 0.04382617, -0.0679653 , -0.11690815, ..., 0.00065254,
0. , 0. ], dtype=float32), array([ 0.04383327, -0.06797152, -0.11691316, ..., 0.00065254,
0. , 0. ], dtype=float32))
Hello,
thanks for reporting the bug.
This doesn't happen in baselines.
Could you be more precise, what script did you try for the baselines version?
(Normally, we did not change the logic of ppo1 nor of mpi_adam)
Thanks for looking into this! I鈥檒l try a fresh install today. I鈥檓 on Mac. If it still doesn鈥檛 work, I can send you the output of pip freeze, and perhaps my MPI installation too.
I don鈥檛 have quite as simple an example for baselines (that鈥檚 the beauty of stable-baselines, right?), but I鈥檝e done many performance runs of PPO1 there and have never encountered this issue. I first noticed first on my own custom env but it persisted with CartPole-v1. TRPO (which also uses mpiadam) seems to work fine.
Re: not changing the logic of PPO1/mpiadam. This is why I was surprised I encountered the error.
EDIT: Yes, import gym was there the whole time. I formatted it incorrectly so it didn't display in my original post.
I tried reinstalling from scratch. This was my entire workflow:
$ python3 -m venv venv3
$ source venv3/bin/activate
$ pip install --no-cache-dir stable-baselines
$ mpirun -n 2 python ppo1_bug.py
Same issue. Here's my output of pip freeze:
absl-py==0.5.0
astor==0.7.1
atari-py==0.1.6
atomicwrites==1.2.1
attrs==18.2.0
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
Click==7.0
cloudpickle==0.5.6
cycler==0.10.0
Cython==0.28.5
dill==0.2.8.2
future==0.16.0
gast==0.2.0
glfw==1.7.0
glob2==0.6
grpcio==1.15.0
gym==0.10.8
h5py==2.8.0
idna==2.7
imageio==2.4.1
joblib==0.12.5
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
kiwisolver==1.0.1
lockfile==0.12.2
Markdown==3.0.1
matplotlib==3.0.0
more-itertools==4.3.0
mpi4py==3.0.0
mujoco-py==1.50.1.59
numpy==1.15.2
opencv-python==3.4.3.18
pandas==0.23.4
Pillow==5.3.0
pluggy==0.7.1
progressbar2==3.38.0
protobuf==3.6.1
py==1.6.0
pycparser==2.19
pyglet==1.3.2
PyOpenGL==3.1.0
pyparsing==2.2.2
pytest==3.8.2
python-dateutil==2.7.3
python-utils==2.3.0
pytz==2018.5
pyzmq==17.1.2
requests==2.19.1
scipy==1.1.0
seaborn==0.9.0
six==1.11.0
stable-baselines==2.1.0
tensorboard==1.11.0
tensorflow==1.11.0
termcolor==1.1.0
tqdm==4.26.0
urllib3==1.23
Werkzeug==0.14.1
zmq==0.0.0
EDIT: Just tried on another machine, still a Mac. This one had Open MPI 2.0.0. Same setup as above, resulting in the same issue.
@araffin, Just tried again on a completely different machine (a RedHat cluster), and got the exact same issue. So it doesn't appear to be machine-specific. You posted earlier that you could not reproduce the bug, but looks like that message got deleted. Could you clarify where you're at with looking into/reproducing this?
Thanks again for all your help!
Update: I spotted the bug. I noticed that the error did not persist when the PPO1 argument schedule="constant" (the default value is "linear"). Annealing occurs based on the value of timesteps_so_far.
In baselines, timesteps_so_far is calculated by MPI-gathering episodes across all workers. Relevant baselines code here:
lrlocal = (seg["ep_lens"], seg["ep_rets"]) # local values
listoflrpairs = MPI.COMM_WORLD.allgather(lrlocal) # list of lens, rews = map(flatten_lists, zip(*listoflrpairs))
...
timesteps_so_far += sum(lens)
However, in stable-baselines, timesteps_so_far is based on the current worker only (which apparently can differ):
timesteps_so_far += seg["total_timestep"]
The "total_timesteps" key (which isn't in baselines) was added at some point to avoid the "mean of an empty slice" warning when no episodes had completed. But the local values were never MPI-gathered.
To fix the bug, I changed the previous line to:
timesteps_so_far += sum(MPI.COMM_WORLD.allgather(seg["total_timestep"]))
and everything is working fine now. Let me know if you'd like me to submit a PR.
Nice catch, thank you =)
Yes, please submit a PR, we'll review it and merge it ;)
Maybe add also a test (a running test will suffice) so further changes won t reintroduce this bug.
To answer your previous question, i have the same env as you and was able to reproduce the bug (at first, i was using only one process, so no error, then i used the right command)
I was planning to check that this week end
Code and tests updated, closing.
Most helpful comment
Update: I spotted the bug. I noticed that the error did not persist when the
PPO1argumentschedule="constant"(the default value is"linear"). Annealing occurs based on the value oftimesteps_so_far.In baselines,
timesteps_so_faris calculated by MPI-gathering episodes across all workers. Relevant baselines code here:However, in stable-baselines,
timesteps_so_faris based on the current worker only (which apparently can differ):The
"total_timesteps"key (which isn't in baselines) was added at some point to avoid the "mean of an empty slice" warning when no episodes had completed. But the local values were never MPI-gathered.To fix the bug, I changed the previous line to:
and everything is working fine now. Let me know if you'd like me to submit a PR.