Stable-baselines: What does the output mean? [question]

Created on 8 Dec 2018 · 5Comments · Source: hill-a/stable-baselines

I get the following output when training a PPO model on my environment:

| approxkl | 0.00069229305 |
| clipfrac | 0.00390625 |
| ep_rewmean | nan |
| eplenmean | nan |
| explained_variance | 0.847 |
| fps | 142 |
| nupdates | 782 |
| policy_entropy | 3.3405766 |
| policy_loss | -0.011813248 |
| serial_timesteps | 100096 |
| time_elapsed | 656 |
| total_timesteps | 100096 |
| value_loss | 0.4478733 |

What do these values mean or where can I find a description of the meaning of these values respectively?

documentation question

Source

JoelNiklaus

👍1

Most helpful comment

Hello,

For that, I recommend you to read PPO paper.

The parameters not related to PPO:

explained variance, see here and wikipedia
ep_rewmean: mean reward per episode
eplenmean: mean episode length
serial_timesteps, i think it the same as total_timesteps (here for legacy reason I suppose)
nupdates: number of gradient updates
fps: frames per second (step per second)

Stable-Baselines Documentation: https://stable-baselines.readthedocs.io/en/master/modules/ppo2.html
Additional Documentation: https://spinningup.openai.com/en/latest/algorithms/ppo.html

araffin on 8 Dec 2018

👍3 😕1

All 5 comments

Hello,

For that, I recommend you to read PPO paper.

The parameters not related to PPO:

explained variance, see here and wikipedia
ep_rewmean: mean reward per episode
eplenmean: mean episode length
serial_timesteps, i think it the same as total_timesteps (here for legacy reason I suppose)
nupdates: number of gradient updates
fps: frames per second (step per second)

Stable-Baselines Documentation: https://stable-baselines.readthedocs.io/en/master/modules/ppo2.html
Additional Documentation: https://spinningup.openai.com/en/latest/algorithms/ppo.html

araffin on 8 Dec 2018

👍3 😕1

Hi,

Great. Thank you very much for the pointers.

JoelNiklaus on 9 Dec 2018

serial_timesteps, i think it the same as total_timesteps (here for legacy reason I suppose)

I'm not sure that the explanation of serial_timesteps and total_timesteps is correct. If you look at where these come from, in the for update in range(1, nupdates + 1): training loop of PPO2.learn, total_timesteps can be seen to be the number of _gradient updates_ (i.e. epochs) performed on the network, and thus has nothing to do with the number of steps of the environment that have been made.

By contrast, serial_timesteps is a slightly confusing metric of the number of environment steps that have been made, but one which disregards the number of envs running in parallel i.e. with n_steps=64, serial_timesteps will increment by 64 every time new data is collected to train the policy network for noptepochs epochs. It doesn't matter whether n_envs=1 or n_envs=100, serial_timesteps will only increase by 64. I might open an issue suggesting that this is changed such that serial_timesteps is renamed env_timesteps and also returns n_envs*n_steps each time a policy is trained, rather than n_steps. Similarly, perhaps total_timesteps should be renamed n_epochs, or removed altogether as given n_updates and the number of epochs per update it provides somewhat redundant information.

bibbygoodwin on 10 Jun 2019

@araffin you've mentioned the fps and I'm trying to figure out why the fps is showing "0" for me, when using ppo2 (I only tested ppo2 ). Does that indicate an issue in the implementation or could be caused by slow steps per second? Here's a sample of my output

| approxkl | 0.000308714 | | clipfrac | 0.0 | | ep_len_mean | 43.7 | | ep_reward_mean | -162 | | explained_variance | -1.19e-07 | | fps | 0 | | n_updates | 8 | | policy_entropy | 1.79129 | | policy_loss | -0.00336207 | | serial_timesteps | 1024 | | time_elapsed | 1.92e+03 | | total_timesteps | 1024 | | value_loss | 1433.95 | ------------------------------------