I have two questions about collecting observations when using PPO
The first one is probably too basic, but I haven't found the answer anywhere so I guess I will ask it here. Let's say I have an agent trying to avoid obstacles, and one of the observations is "distance to obstacle". But what if the obstacles isn't always existing in the scene, or isn't visible to the agent. What could be the best practice to the "distance to obstacle"'s observation value here?
My second question. Let's say I have trained a model successfully for the agent to avoid 1 obstacle. What if I want to continue training the agent to avoid 2 or 3 more obstacles (or even 100, I don't know 馃槃). Obviously if I change the size of vector observation, I need to train the agent from the beginning, but I don't know if there's any better solution for that?
I had to deal with this issue a couple of times. The best solution I was able to come up with is setting a fixed maximum number of obstacles in the agent's observation space. If there are more obstacles present, then I would prioritize them based on their distance or size etc - whatever is most critical to the situation. And ignore the less important ones. If there are less obstacles, then the agent observes null values for the non-existing ones. For example, "null value" with regard to distance could be -1, if the observed distance is normalized to measured_distance divided by max_detectable_distance. An alternative to prioritizing could be clustering, so that a group of close together obstacles is observed as a single one.
Thank you for your reply.
For the observation value, what if the observation is not distance value but some vector / position? I'm current using (-1, -1) but I suppose it's not the same as "not in sight", even after normalisation, right?
Regarding maximum number of obstacles, I'd do the same thing. I'm hoping if there's any other way, let's say if I want to add any more observation in the future that I don't even know yet :D
You could split up the vector into a normalized vector and a magnitude scalar. Let's say your 2D vector describes an obstacle's position relative to your agent. The normalized vector becomes the direction and the scalar is the distance. Then you would set the vector observation to 0/0 and the scalar to -1 if there's no obstacle present.
I don't think you can get around setting a fixed number of observations though. Because adding more observations later would mean changing the model's input layer size. AFAIK that's not possible with ml-agents.
Hi @trinhthanhtrung
These are good questions. Another solution that has not been mentioned yet it to use something like ray casts to check for the presence and distance of an obstacle. This is generalizable to any number of obstacles in the environment, while still being fixed vector size. You can find examples of this approach in many of our examples environments such as BananaCollector and Pyramids.
Hi @trinhthanhtrung
These are good questions. Another solution that has not been mentioned yet it to use something like ray casts to check for the presence and distance of an obstacle. This is generalizable to any number of obstacles in the environment, while still being fixed vector size. You can find examples of this approach in many of our examples environments such as
BananaCollectorandPyramids.
Thank you. I'm looking at the examples that you said, specifically the List
I just want to clarify, in case the agent doesn't see any obstacles, you will add into the observations the _max length of each raycast vector_, am I right?
I just want to clarify, in case the agent doesn't see any obstacles, you will add into the observations the max length of each raycast vector, am I right?
RayPerception returns dedicated values for distinguishing between detection and non-detection of objects.
For each ray, you get a list of values, for example:
For RayPerception, does the agent only detect the first object along the path of the ray? Can the agent detect multiple objects along the path of the raycast?
@muffinmiffin RayPerception uses Physics.SphereCast / Physics2D.CircleCast, so it only returns the first hit. If you want all objects, you'll need to write a detection script using Physics.SphereCastAll / Physics2D.CircleCastAll.
RayPerception returns dedicated values for distinguishing between detection and non-detection of objects.
For each ray, you get a list of values, for example:
- object type A detected: 1 (yes) / 0 (no)
- object type B detected: 1 (yes) / 0 (no)
- object type C detected: 1 (yes) / 0 (no)
- no object detected: 1 (yes) / 0 (no)
- 0 if no object was detected, otherwise measured_distance divided by max_distance (ray length) for the closest object, farther away ones are ignored
Thanks! The explanation makes it much clearer when I relook at the code now. That also explains not only changing the number of raycasts will change the size of vector observation but also changing the number of detectable objects will do.
I have another question related to the vector of observation: What if now the agent needs to observe other type of position that's not relatively related to the agent. Let's say the agent now needs to plan something with a house position in mind. Obviously in this case a closer house or a farther house isn't much different to the agent. Should I use one-hot encoding in this situation?
Hi all -- this issue has been inactive for some time so I'm going to close it. Feel free to reopen or create a new issue if you have more to discuss.
Most helpful comment
Hi @trinhthanhtrung
These are good questions. Another solution that has not been mentioned yet it to use something like ray casts to check for the presence and distance of an obstacle. This is generalizable to any number of obstacles in the environment, while still being fixed vector size. You can find examples of this approach in many of our examples environments such as
BananaCollectorandPyramids.