Ml-agents: Academy.Instance.MaxStepCount != Python console output

Created on 15 Jun 2020 · 4Comments · Source: Unity-Technologies/ml-agents

Describe the bug
Printing 'Academy.Instance.MaxStepCount' during training differs from the Python console output.

When the MaxStepCount is at around 5-6k in Unity, it prints the 12k step message on the console.
Interestingly, changing the decision period affects the difference too. If I change the decision period from 5 to 1 the MaxStepCount is at around 1k in Unity when it prints the 12k step message on the console. My understanding is that the MaxStepCount is not related to the decision period if "Take actions between decisions" is ticked and that the python connector uses the TotalStepCount value.

To Reproduce
Steps to reproduce the behavior:

loading the 3DBall example environment
creating an empty game object
Adding a script to the game object
Adding two lines to the script
4.1 using Unity.MLAgents;
4.2 print(Academy.Instance.TotalStepCount); // in the update loop

Environment (please complete the following information):

Unity Version: 2019.3.12f1
OS + version: Windows 10 Version 2004
_ML-Agents version_: Tested with Release 1 and Release 3
_TensorFlow version_: Release 1: 2.1 and Release 3: 2.2
_Environment_: 3DBall

bug

Source

ChristianCoenen

👍1

Most helpful comment

@ChristianCoenen Yes you are on the right track. When you have 30 environment, you can get 0 - 30 agent step during 1 academy step (since some of the agent might not finish during that academy step). And the ratio between agent step vs trainer step is determined by the decision period. So if the decision period is 1, you can get 0 - 30 trainer steps.

I do agree that this is quite hard for many user to understand, and we are working on a new code refactor to make it more understandable. Hopefully that will help.

xiaomaogy on 24 Jun 2020

👍2

All 4 comments

Hi @ChristianCoenen, the MaxStepCount in Unity is related to the number of steps for the environment, while the step count in console is related to the step number for the trainer, so they are different.

xiaomaogy on 16 Jun 2020

👍2

Hi @xiaomaogy, so can you explain when the step count (which is printed on the console) is updated (like after a set of observations, actions, rewards are collected and then it is updated?). And can I access this value in Unity using the ML-Agents runtime scripts?

Dhyeythumar on 16 Jun 2020

@xiaomaogy thanks for your answer!

I did some more research and would like to know if I am on the right track.

A DecisionPeriod of 5 means that the Agent will request a decision every 5 Academy steps.
- _found in the decision requester script as a comment_
AgentScript -> max_step == Academy steps
- _Tested_
Tensorboard -> Episode Length == Decision steps
- _Tested_

Conclusion:
When I have 30 environments (1 agent in each environment) in one scene, then I can get 0-30 trainer steps during 1 Academy step? And the trainer step counter is increased by 1 each time a decision is requested from one of the 30 agents right?

That would be the only way which explains why the Academy MaxStepCount is lower than the trainer count and that that gap increases the lower the decision period / the more agents in a scene.

Note:
I think this is quite hard for many users to grasp (I got no answer in the Forum despite quite a few views before opening it here as an issue). The best I could find about this in the ML Agents doc was The Simulation and Training Process. But that didn't answer many of my questions. I think some additional documentation might help some users.

ChristianCoenen on 16 Jun 2020

👍1

I do agree that this is quite hard for many user to understand, and we are working on a new code refactor to make it more understandable. Hopefully that will help.

xiaomaogy on 24 Jun 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings