Describe the bug
Hi, I was using mlagents_envs API to train my own RL implementation but
for i in range(30):
env.reset()
decision_steps, terminal_steps = env.get_steps(behavior_name)
print(len(decision_steps),len(terminal_steps))
sometimes returns no decision_steps, but terminal_steps like
1 0
1 0
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 1 This one
1 0
1 0
1 0
1 0
1 0
1 0
1 0
To Reproduce
Steps to reproduce the behavior:
Use the code
from mlagents_envs.environment import UnityEnvironment
#This is a non-blocking call that only loads the environment.
env = UnityEnvironment(file_name=None, seed=1, side_channels=[]) #After running this statement Click on Play in Unity editor
#Start interacting with the evironment.
env.reset()
behavior_names = env.behavior_specs.keys()
behavior_name = list(env.behavior_specs)[0]
for episode in range(100):
env.reset()
decision_steps, terminal_steps = env.get_steps(behavior_name)
#print(list(decision_steps),list(terminal_steps),behavior_name)
print(len(decision_steps),len(terminal_steps))
tracked_agent = -1 # -1 indicates not yet tracking
done = False # For the tracked_agent
episode_rewards = 0 # For the tracked_agent
while not done:
# Track the first agent we see if not tracking
# Note : len(decision_steps) = [number of agents that requested a decision]
if tracked_agent == -1 and len(decision_steps) >= 1:
tracked_agent = decision_steps.agent_id[0]
print(list(decision_steps),list(terminal_steps),tracked_agent)
# Generate an action for all agents
action = spec.create_random_action(len(decision_steps))
# Set the actions
env.set_actions(behavior_name, action)
# Move the simulation forward
env.step()
decision_steps, terminal_steps = env.get_steps(behavior_name)
if tracked_agent in decision_steps: # The agent requested a decision
episode_rewards += decision_steps[tracked_agent].reward
if tracked_agent in terminal_steps: # The agent terminated its episode
episode_rewards += terminal_steps[tracked_agent].reward
done = True
print(f"Total rewards for episode {episode} is {episode_rewards}")
Sometimes the tracked_agent variable does not change the value, it stays as -1 inside the while loop.
It does not seem to cause problem in unity environment but when implementing my own algorithm sometimes after reset i don't get an observation in decision_steps so I have to work around it.
At first I thought maybe there is mistake in replicating the environment in my system , but since this occurs only sometimes I don't think that is the case.
Above code will take more time so you can just use this one to see that sometimes
there is nothing in decision_steps even after just resting the environment.
for i in range(30):
env.reset()
decision_steps, terminal_steps = env.get_steps(behavior_name)
print(len(decision_steps),len(terminal_steps))
Environment :
I tried the same for multiple agents, I can not understand why decision _steps and terminal_steps both have all agents when only 1 agent should be terminated.
have the similar problem, after call env.reset() sometimes there will be terminal steps.
Hi @MedhaviMonish could you please share the config file that you are using?
This does not use any config file. I am using my own implementation of A2C.
I was able to reproduce the issue you are facing and have added it to our bug tracker. We will take a closer look at this issue. Thanks for bringing this to our attention.
And can you also look into the issue of multiple agents? I was unable to keep track of which agent has terminated or not. Even if only 1 agent has terminated other agents give unexpected output. If one agent out of three has terminated than env.get_steps(behavior_name) should return 2 in decision_steps and 1 in terminal_steps instead its unpredictable what will be the return.
Hi, any updates regarding this?
I am experiencing the same issue discussed above when using the 3DBall sample environment. The length of decision_steps plus the length of terminal_steps is not always the same as the number of agents in the scene. I understand that the logic behind requesting decisions has changed, but now I do not know how are you supposed to step the environment in order to collect all data.
If you do this while stepping the environment:
decision_steps, terminal_steps = env.get_steps(behavior_name)
print(len(decision_steps), len(terminal_steps))
You get an output looking like this:
>> 12, 0
>> 12, 0
...
>> 0, 1
The scene has 12 agents, but at some point one of them terminates and you get only one entry inside terminal_steps.
How do I get the data from the remaining agents? Calling get_steps() once again doesn't seem to work. Do I need to set a new action for that particular agent and call step()again?
Thanks for the help
This issue has been clarified to me in the forum. I do not think this is a bug and the issue should be updated.
Most helpful comment
And can you also look into the issue of multiple agents? I was unable to keep track of which agent has terminated or not. Even if only 1 agent has terminated other agents give unexpected output. If one agent out of three has terminated than env.get_steps(behavior_name) should return 2 in decision_steps and 1 in terminal_steps instead its unpredictable what will be the return.