Ml-agents: Question: Decoding an observation (Bananas Env)

Created on 27 Aug 2018 · 7Comments · Source: Unity-Technologies/ml-agents

For the Udacity Deep Reinforcement Learning class students are presented with a simplified version of Banana Collectors. However the description of the environment is incomplete and I would like to contribute that.

Most generally the question is "How do we get labels for each value returned in an enviroment's state vector."

Specifically for my needs the question is "How do I decode (get a complete description of) the values returned as state for the Bananas environment."

As an example I'm thinking of a description like this (random guesses in terms of content)

State

Vector - 37 values

Values 1-36 - ray values
Value 37 - agent linear velocity

Ray Values

6 vectors of length 6
Values 1-5 - (1 or 0) - ray segments in increasing distance from the agent presence of a banana
Value 6 - Angular rotation of the ray from it starting point

help-wanted

Source

iandanforth

Most helpful comment

@mmattar Thanks for the reply. Let me show you what my best guess is for the Udacity version of the Bananas environment to give you an idea of what I was looking for. It's essentially a more detailed version of the state description provided in the docs. To figure this out I had to do a bit of searching/parsing of the unity-ml code (which was educational, but might not be everyone's cup of tea)

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.

Ray Perception (35)

7 rays projecting from the agent at the following angles (and returned in this order):

[20, 90, 160, 45, 135, 70, 110] # 90 is directly in front of the agent

Ray (5)

Each ray is projected into the scene. If it encounters one of four detectable objects the value at that position in the array is set to 1. Finally there is a distance measure which is a fraction of the ray length.

[Banana, Wall, BadBanana, Agent, Distance]

example

[0, 1, 1, 0, 0.2]

There is a BadBanana detected 20% of the way along the ray and a wall behind it.

Velocity of Agent (2)

Left/right velocity (usually near 0)
Forward/backward velocity (0-11.2)

iandanforth on 31 Aug 2018

❤4 👍2

All 7 comments

I've mostly figured this out by reading the source code. So modified question is there any "easier" way than reading the source?

iandanforth on 28 Aug 2018

Hi @iandanforth - just to clarify what you're asking: by values do you mean the vector observations? And by labels, do you mean actions? The step method in the Python API allows you to retrieve that observations and rewards given an action - is this what you're interested in?

mmattar on 30 Aug 2018