For example, I have 4 policies in my multiagent policy configuration, and after the first training iteration the timesteps_total is 4000.
Is that number per agent or overall? I.e.:
Which one is it?
It's the number of times step has been called on the env (so probably it means each agent has run 4000 timesteps, assuming each agent participates in every step).
Thanks, makes sense!
it means each agent has run 4000 timesteps
Wouldn't his timesteps_total be 16,000 then?