Ml-agents: Using mlagents_envs API, and issue with env.get_steps(behavior_name)

Created on 6 Aug 2020  路  8Comments  路  Source: Unity-Technologies/ml-agents

Describe the bug
Hi, I was using mlagents_envs API to train my own RL implementation but

for i in range(30):
    env.reset()
    decision_steps, terminal_steps = env.get_steps(behavior_name)
    print(len(decision_steps),len(terminal_steps))

sometimes returns no decision_steps, but terminal_steps like

1 0
1 0
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 0
1 0
1 0

To Reproduce
Steps to reproduce the behavior:

  1. Create the environment https://github.com/Unity-Technologies/ml-agents/blob/release_5_docs/docs/Learning-Environment-Create-New.md
  1. Use the code

    from mlagents_envs.environment import UnityEnvironment
        #This is a non-blocking call that only loads the environment.
        env = UnityEnvironment(file_name=None,  seed=1, side_channels=[]) #After running this statement Click on Play in Unity editor
        #Start interacting with the evironment.
        env.reset()
        behavior_names = env.behavior_specs.keys()
        behavior_name = list(env.behavior_specs)[0] 
        for episode in range(100):
            env.reset()
            decision_steps, terminal_steps = env.get_steps(behavior_name)
            #print(list(decision_steps),list(terminal_steps),behavior_name)
            print(len(decision_steps),len(terminal_steps))
            tracked_agent = -1 # -1 indicates not yet tracking
            done = False # For the tracked_agent
            episode_rewards = 0 # For the tracked_agent
            while not done:
                # Track the first agent we see if not tracking 
                # Note : len(decision_steps) = [number of agents that requested a decision]
                if tracked_agent == -1 and len(decision_steps) >= 1:
                    tracked_agent = decision_steps.agent_id[0] 
                print(list(decision_steps),list(terminal_steps),tracked_agent)
    
                # Generate an action for all agents
                action = spec.create_random_action(len(decision_steps))
    
                # Set the actions
                env.set_actions(behavior_name, action)
    
                # Move the simulation forward
                env.step()
    
                decision_steps, terminal_steps = env.get_steps(behavior_name)
                if tracked_agent in decision_steps: # The agent requested a decision
                    episode_rewards += decision_steps[tracked_agent].reward
                if tracked_agent in terminal_steps: # The agent terminated its episode
                    episode_rewards += terminal_steps[tracked_agent].reward
                    done = True
            print(f"Total rewards for episode {episode} is {episode_rewards}")
    

Sometimes the tracked_agent variable does not change the value, it stays as -1 inside the while loop.
It does not seem to cause problem in unity environment but when implementing my own algorithm sometimes after reset i don't get an observation in decision_steps so I have to work around it.
At first I thought maybe there is mistake in replicating the environment in my system , but since this occurs only sometimes I don't think that is the case.

Above code will take more time so you can just use this one to see that sometimes
there is nothing in decision_steps even after just resting the environment.

    for i in range(30):
        env.reset()
        decision_steps, terminal_steps = env.get_steps(behavior_name)
        print(len(decision_steps),len(terminal_steps))

Environment :

  • Unity Version: Unity 2019.3.0f6
  • OS + version: Windows 10]
  • _ML-Agents version_: 0.19.0.dev0
  • _TensorFlow version_: 2.3.0
    -_Environment_: RollerBall
bug

Most helpful comment

And can you also look into the issue of multiple agents? I was unable to keep track of which agent has terminated or not. Even if only 1 agent has terminated other agents give unexpected output. If one agent out of three has terminated than env.get_steps(behavior_name) should return 2 in decision_steps and 1 in terminal_steps instead its unpredictable what will be the return.

All 8 comments

I tried the same for multiple agents, I can not understand why decision _steps and terminal_steps both have all agents when only 1 agent should be terminated.

have the similar problem, after call env.reset() sometimes there will be terminal steps.

Hi @MedhaviMonish could you please share the config file that you are using?

This does not use any config file. I am using my own implementation of A2C.

I was able to reproduce the issue you are facing and have added it to our bug tracker. We will take a closer look at this issue. Thanks for bringing this to our attention.

And can you also look into the issue of multiple agents? I was unable to keep track of which agent has terminated or not. Even if only 1 agent has terminated other agents give unexpected output. If one agent out of three has terminated than env.get_steps(behavior_name) should return 2 in decision_steps and 1 in terminal_steps instead its unpredictable what will be the return.

Hi, any updates regarding this?
I am experiencing the same issue discussed above when using the 3DBall sample environment. The length of decision_steps plus the length of terminal_steps is not always the same as the number of agents in the scene. I understand that the logic behind requesting decisions has changed, but now I do not know how are you supposed to step the environment in order to collect all data.

If you do this while stepping the environment:

decision_steps, terminal_steps = env.get_steps(behavior_name)
print(len(decision_steps), len(terminal_steps))

You get an output looking like this:

>> 12, 0
>> 12, 0
...
>> 0, 1

The scene has 12 agents, but at some point one of them terminates and you get only one entry inside terminal_steps.
How do I get the data from the remaining agents? Calling get_steps() once again doesn't seem to work. Do I need to set a new action for that particular agent and call step()again?

Thanks for the help

This issue has been clarified to me in the forum. I do not think this is a bug and the issue should be updated.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Procuste34 picture Procuste34  路  3Comments

MarkTension picture MarkTension  路  3Comments

green4you picture green4you  路  4Comments

gerardsimons picture gerardsimons  路  3Comments

GeriBP picture GeriBP  路  3Comments