Stable-baselines: MPIAdam synchronization error in PPO1

Created on 4 Oct 2018 · 7Comments · Source: hill-a/stable-baselines

Describe the bug
A simple run of PPO1 crashes. The assertion thetaroot == thetalocal fails, and it's not due to NaNs as the floats differ. This doesn't happen in baselines.

Code example
Minimal reproducible example:

import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines import PPO1

env = gym.make("CartPole-v1")
model = PPO1(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=10000)

System Info

Installed from source in virtual environment
No GPU
Python 3.6.5
mpi4py==3.0.0
tensorflow==1.8.0
Open MPI 3.1.1
commit 4983566292a5d3ae0ed1a6bce84a8ac8278e3de5

Stdout + Traceback

(venv) petersen33md:runs petersen33md$ mpirun -n 2 python ppo1_test.py 
********** Iteration 0 ************

...

********** Iteration 6 ************
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
Optimizing...
     pol_surr |    pol_entpen |       vf_loss |            kl |           ent
     -0.00082 |      -0.00627 |     117.56442 |      8.56e-05 |       0.62709
     -0.00030 |      -0.00630 |     128.11664 |      7.79e-05 |       0.63015
Traceback (most recent call last):
  File "ppo1_test.py", line 160, in <module>
    model.learn(total_timesteps=10000, callback=callback)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/ppo1/pposgd_simple.py", line 272, in learn
    self.adam.update(grad, self.optim_stepsize * cur_lrmult)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/mpi_adam.py", line 48, in update
    self.check_synced()
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/mpi_adam.py", line 83, in check_synced
    assert (thetaroot == thetalocal).all(), (thetaroot, thetalocal)
AssertionError: (array([ 0.04382617, -0.0679653 , -0.11690815, ...,  0.00065254,
        0.        ,  0.        ], dtype=float32), array([ 0.04383327, -0.06797152, -0.11691316, ...,  0.00065254,
        0.        ,  0.        ], dtype=float32))

bug

Source

brendenpetersen

Most helpful comment

Update: I spotted the bug. I noticed that the error did not persist when the PPO1 argument schedule="constant" (the default value is "linear"). Annealing occurs based on the value of timesteps_so_far.

In baselines, timesteps_so_far is calculated by MPI-gathering episodes across all workers. Relevant baselines code here:

lrlocal = (seg["ep_lens"], seg["ep_rets"]) # local values
listoflrpairs = MPI.COMM_WORLD.allgather(lrlocal) # list of lens, rews = map(flatten_lists, zip(*listoflrpairs))
...
timesteps_so_far += sum(lens)

However, in stable-baselines, timesteps_so_far is based on the current worker only (which apparently can differ):

timesteps_so_far += seg["total_timestep"]

The "total_timesteps" key (which isn't in baselines) was added at some point to avoid the "mean of an empty slice" warning when no episodes had completed. But the local values were never MPI-gathered.

To fix the bug, I changed the previous line to:

timesteps_so_far += sum(MPI.COMM_WORLD.allgather(seg["total_timestep"]))

and everything is working fine now. Let me know if you'd like me to submit a PR.

brendenpetersen on 5 Oct 2018

🎉1 👍1

All 7 comments

Hello,
thanks for reporting the bug.

This doesn't happen in baselines.

Could you be more precise, what script did you try for the baselines version?
(Normally, we did not change the logic of ppo1 nor of mpi_adam)

araffin on 5 Oct 2018

Thanks for looking into this! I’ll try a fresh install today. I’m on Mac. If it still doesn’t work, I can send you the output of pip freeze, and perhaps my MPI installation too.

I don’t have quite as simple an example for baselines (that’s the beauty of stable-baselines, right?), but I’ve done many performance runs of PPO1 there and have never encountered this issue. I first noticed first on my own custom env but it persisted with CartPole-v1. TRPO (which also uses mpiadam) seems to work fine.

Re: not changing the logic of PPO1/mpiadam. This is why I was surprised I encountered the error.

EDIT: Yes, import gym was there the whole time. I formatted it incorrectly so it didn't display in my original post.

brendenpetersen on 5 Oct 2018

I tried reinstalling from scratch. This was my entire workflow:

$ python3 -m venv venv3
$ source venv3/bin/activate
$ pip install --no-cache-dir stable-baselines
$ mpirun -n 2 python ppo1_bug.py

Same issue. Here's my output of pip freeze:

absl-py==0.5.0
astor==0.7.1
atari-py==0.1.6
atomicwrites==1.2.1
attrs==18.2.0
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
Click==7.0
cloudpickle==0.5.6
cycler==0.10.0
Cython==0.28.5
dill==0.2.8.2
future==0.16.0
gast==0.2.0
glfw==1.7.0
glob2==0.6
grpcio==1.15.0
gym==0.10.8
h5py==2.8.0
idna==2.7
imageio==2.4.1
joblib==0.12.5
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
kiwisolver==1.0.1
lockfile==0.12.2
Markdown==3.0.1
matplotlib==3.0.0
more-itertools==4.3.0
mpi4py==3.0.0
mujoco-py==1.50.1.59
numpy==1.15.2
opencv-python==3.4.3.18
pandas==0.23.4
Pillow==5.3.0
pluggy==0.7.1
progressbar2==3.38.0
protobuf==3.6.1
py==1.6.0
pycparser==2.19
pyglet==1.3.2
PyOpenGL==3.1.0
pyparsing==2.2.2
pytest==3.8.2
python-dateutil==2.7.3
python-utils==2.3.0
pytz==2018.5
pyzmq==17.1.2
requests==2.19.1
scipy==1.1.0
seaborn==0.9.0
six==1.11.0
stable-baselines==2.1.0
tensorboard==1.11.0
tensorflow==1.11.0
termcolor==1.1.0
tqdm==4.26.0
urllib3==1.23
Werkzeug==0.14.1
zmq==0.0.0

EDIT: Just tried on another machine, still a Mac. This one had Open MPI 2.0.0. Same setup as above, resulting in the same issue.

brendenpetersen on 5 Oct 2018

@araffin, Just tried again on a completely different machine (a RedHat cluster), and got the exact same issue. So it doesn't appear to be machine-specific. You posted earlier that you could not reproduce the bug, but looks like that message got deleted. Could you clarify where you're at with looking into/reproducing this?

Thanks again for all your help!

brendenpetersen on 5 Oct 2018

In baselines, timesteps_so_far is calculated by MPI-gathering episodes across all workers. Relevant baselines code here:

lrlocal = (seg["ep_lens"], seg["ep_rets"]) # local values
listoflrpairs = MPI.COMM_WORLD.allgather(lrlocal) # list of lens, rews = map(flatten_lists, zip(*listoflrpairs))
...
timesteps_so_far += sum(lens)

However, in stable-baselines, timesteps_so_far is based on the current worker only (which apparently can differ):

timesteps_so_far += seg["total_timestep"]

To fix the bug, I changed the previous line to:

timesteps_so_far += sum(MPI.COMM_WORLD.allgather(seg["total_timestep"]))

and everything is working fine now. Let me know if you'd like me to submit a PR.

brendenpetersen on 5 Oct 2018

🎉1 👍1

Nice catch, thank you =)
Yes, please submit a PR, we'll review it and merge it ;)
Maybe add also a test (a running test will suffice) so further changes won t reintroduce this bug.
To answer your previous question, i have the same env as you and was able to reproduce the bug (at first, i was using only one process, so no error, then i used the right command)
I was planning to check that this week end

araffin on 5 Oct 2018

Code and tests updated, closing.

araffin on 12 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings