Ml-agents: Using mlagents_envs API, and issue with env.get_steps(behavior_name)

Created on 6 Aug 2020 · 8Comments · Source: Unity-Technologies/ml-agents

Describe the bug
Hi, I was using mlagents_envs API to train my own RL implementation but

for i in range(30):
    env.reset()
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    print(len(decision_steps),len(terminal_steps))

sometimes returns no decision_steps, but terminal_steps like

1 0
1 0
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 0
1 0
1 0

To Reproduce
Steps to reproduce the behavior:

Create the environment https://github.com/Unity-Technologies/ml-agents/blob/release_5_docs/docs/Learning-Environment-Create-New.md

Use the code

from mlagents_envs.environment import UnityEnvironment
    #This is a non-blocking call that only loads the environment.
    env = UnityEnvironment(file_name=None,  seed=1, side_channels=[]) #After running this statement Click on Play in Unity editor
    #Start interacting with the evironment.
    env.reset()
    behavior_names = env.behavior_specs.keys()
    behavior_name = list(env.behavior_specs)[0] 
    for episode in range(100):
        env.reset()
        decision_steps, terminal_steps = env.get_steps(behavior_name)
        #print(list(decision_steps),list(terminal_steps),behavior_name)
        print(len(decision_steps),len(terminal_steps))
        tracked_agent = -1 # -1 indicates not yet tracking
        done = False # For the tracked_agent
        episode_rewards = 0 # For the tracked_agent
        while not done:
            # Track the first agent we see if not tracking 
            # Note : len(decision_steps) = [number of agents that requested a decision]
            if tracked_agent == -1 and len(decision_steps) >= 1:
                tracked_agent = decision_steps.agent_id[0] 
            print(list(decision_steps),list(terminal_steps),tracked_agent)

            # Generate an action for all agents
            action = spec.create_random_action(len(decision_steps))

            # Set the actions
            env.set_actions(behavior_name, action)

            # Move the simulation forward
            env.step()

            decision_steps, terminal_steps = env.get_steps(behavior_name)
            if tracked_agent in decision_steps: # The agent requested a decision
                episode_rewards += decision_steps[tracked_agent].reward
            if tracked_agent in terminal_steps: # The agent terminated its episode
                episode_rewards += terminal_steps[tracked_agent].reward
                done = True
        print(f"Total rewards for episode {episode} is {episode_rewards}")

Sometimes the tracked_agent variable does not change the value, it stays as -1 inside the while loop.
It does not seem to cause problem in unity environment but when implementing my own algorithm sometimes after reset i don't get an observation in decision_steps so I have to work around it.
At first I thought maybe there is mistake in replicating the environment in my system , but since this occurs only sometimes I don't think that is the case.

Above code will take more time so you can just use this one to see that sometimes
there is nothing in decision_steps even after just resting the environment.

    for i in range(30):
        env.reset()
        decision_steps, terminal_steps = env.get_steps(behavior_name)
        print(len(decision_steps),len(terminal_steps))

Environment :

Unity Version: Unity 2019.3.0f6
OS + version: Windows 10]
_ML-Agents version_: 0.19.0.dev0
_TensorFlow version_: 2.3.0
-_Environment_: RollerBall

bug

Source

MedhaviMonish

Most helpful comment

And can you also look into the issue of multiple agents? I was unable to keep track of which agent has terminated or not. Even if only 1 agent has terminated other agents give unexpected output. If one agent out of three has terminated than env.get_steps(behavior_name) should return 2 in decision_steps and 1 in terminal_steps instead its unpredictable what will be the return.

MedhaviMonish on 12 Aug 2020

👍2

All 8 comments

I tried the same for multiple agents, I can not understand why decision _steps and terminal_steps both have all agents when only 1 agent should be terminated.

MedhaviMonish on 9 Aug 2020

have the similar problem, after call env.reset() sometimes there will be terminal steps.

Plussun on 11 Aug 2020

Hi @MedhaviMonish could you please share the config file that you are using?

anupambhatnagar on 12 Aug 2020

This does not use any config file. I am using my own implementation of A2C.

MedhaviMonish on 12 Aug 2020

I was able to reproduce the issue you are facing and have added it to our bug tracker. We will take a closer look at this issue. Thanks for bringing this to our attention.

anupambhatnagar on 12 Aug 2020

MedhaviMonish on 12 Aug 2020

👍2

Hi, any updates regarding this?
I am experiencing the same issue discussed above when using the 3DBall sample environment. The length of decision_steps plus the length of terminal_steps is not always the same as the number of agents in the scene. I understand that the logic behind requesting decisions has changed, but now I do not know how are you supposed to step the environment in order to collect all data.

If you do this while stepping the environment:

decision_steps, terminal_steps = env.get_steps(behavior_name)
print(len(decision_steps), len(terminal_steps))

You get an output looking like this:

>> 12, 0
>> 12, 0
...
>> 0, 1

The scene has 12 agents, but at some point one of them terminates and you get only one entry inside terminal_steps.
How do I get the data from the remaining agents? Calling get_steps() once again doesn't seem to work. Do I need to set a new action for that particular agent and call step()again?

Thanks for the help