For the Udacity Deep Reinforcement Learning class students are presented with a simplified version of Banana Collectors. However the description of the environment is incomplete and I would like to contribute that.
Most generally the question is "How do we get labels for each value returned in an enviroment's state vector."
Specifically for my needs the question is "How do I decode (get a complete description of) the values returned as state for the Bananas environment."
As an example I'm thinking of a description like this (random guesses in terms of content)
Vector - 37 values
Ray Values
I've mostly figured this out by reading the source code. So modified question is there any "easier" way than reading the source?
Hi @iandanforth - just to clarify what you're asking: by values do you mean the vector observations? And by labels, do you mean actions? The step method in the Python API allows you to retrieve that observations and rewards given an action - is this what you're interested in?
@mmattar Thanks for the reply. Let me show you what my best guess is for the Udacity version of the Bananas environment to give you an idea of what I was looking for. It's essentially a more detailed version of the state description provided in the docs. To figure this out I had to do a bit of searching/parsing of the unity-ml code (which was educational, but might not be everyone's cup of tea)
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.
Ray Perception (35)
7 rays projecting from the agent at the following angles (and returned in this order):
[20, 90, 160, 45, 135, 70, 110] # 90 is directly in front of the agent
Ray (5)
Each ray is projected into the scene. If it encounters one of four detectable objects the value at that position in the array is set to 1. Finally there is a distance measure which is a fraction of the ray length.
[Banana, Wall, BadBanana, Agent, Distance]
example
[0, 1, 1, 0, 0.2]
There is a BadBanana detected 20% of the way along the ray and a wall behind it.
Velocity of Agent (2)
Left/right velocity (usually near 0)
Forward/backward velocity (0-11.2)
@mmattar
I wonder if there is any max length for ray's length? ie is it possible for a banana/bad banana to be there but too 'far' for the agent to detect in any of the angular rays?
This discussion has been inactive for a while now so I'm going to close it. Feel free to reopen or create a new issue if there's more to add.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
@mmattar Thanks for the reply. Let me show you what my best guess is for the Udacity version of the Bananas environment to give you an idea of what I was looking for. It's essentially a more detailed version of the state description provided in the docs. To figure this out I had to do a bit of searching/parsing of the unity-ml code (which was educational, but might not be everyone's cup of tea)
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.
Ray Perception (35)
7 rays projecting from the agent at the following angles (and returned in this order):
[20, 90, 160, 45, 135, 70, 110] # 90 is directly in front of the agent
Ray (5)
Each ray is projected into the scene. If it encounters one of four detectable objects the value at that position in the array is set to 1. Finally there is a distance measure which is a fraction of the ray length.
[Banana, Wall, BadBanana, Agent, Distance]
example
[0, 1, 1, 0, 0.2]
There is a BadBanana detected 20% of the way along the ray and a wall behind it.
Velocity of Agent (2)
Left/right velocity (usually near 0)
Forward/backward velocity (0-11.2)