Ml-agents: Agent with Multiple Brains

Created on 23 Jan 2020  路  6Comments  路  Source: Unity-Technologies/ml-agents

Hello,

I have a question regarding Multiple Brained agents.

I have seen the wall jump example and I am aware that we can use the GiveModel() function to specifically train a brain and I understand that we can use curriculum learning to build up the complexity of the tasks to improve training.

However, what if my model requires an existing brain to learn another skill? ie if I wanted to train a drone to seek and destroy targets from scratch, I would firstly need to train it to stabilize itself, learn to fly, then I can train it to identify targets and finally destroy them. I'm a bit confused about the training procedure because each task requires a previous skill. The drone would need a brain to stabilize itself while training to move around. and it would need the previous two brains while training to explore the environment and identify targets etc. I believe that training a drone from scratch to fly, to identify and destroy targets in a single brain would be astronomically difficult or could it be possible?

Any help or guideline as to how I would go about conducting the training procedure would be greatly appreciated. Thank you.

discussion

Most helpful comment

Rather than having a single agent with swappable brains, I suggest you create multiple agents with each one being responsible for a specific task. For the drone example, you would first train an agent to stabilize and fly the drone towards a given direction. The direction vector is part of the agent's observation space, it could for instance point towards a randomized target position. After your agent has learned this behaviour, create a second one for higher level tasks like finding targets. This agent is trained to create the direction vector as its output value, which is then fed to the first agent.
I've build a couple of example projects using this type of setup:
https://github.com/mbaske/robot-ants
https://github.com/mbaske/angry-ai
https://github.com/mbaske/ml-drone-collection

All 6 comments

You could train one model to accomplish certain skills like flying forward, backward and rotating.
A second one could be trained afterwards which just utilizes the already trained movement model to do stuff like navigation.

So you would successively build a hierarchy of models.

(This is just a potential concept. I'm not too familiar with the most recent version of ml-agents)

I imagine the switching between different brains is to observe agent behavior at each stage... although it's a bit weird that it loads a new one every time. Right now I am doing a similar task with one brain, which I imagine is the intended use for curriculum learning.

Rather than having a single agent with swappable brains, I suggest you create multiple agents with each one being responsible for a specific task. For the drone example, you would first train an agent to stabilize and fly the drone towards a given direction. The direction vector is part of the agent's observation space, it could for instance point towards a randomized target position. After your agent has learned this behaviour, create a second one for higher level tasks like finding targets. This agent is trained to create the direction vector as its output value, which is then fed to the first agent.
I've build a couple of example projects using this type of setup:
https://github.com/mbaske/robot-ants
https://github.com/mbaske/angry-ai
https://github.com/mbaske/ml-drone-collection

Hello:)

Thank you all for the suggestions, I will try them all out.

@mbaske Thank you for the links to your resources, your projects are fantastic!

@mbaske Hi Mathias,

Do you know if it's possible to train multiple policies in parallel? I have a hiearchical policy structure in which it's not possible to train policies one after another. So I need to do all the training together. I have implemented this but now when I start the training, the ml-agents api only detects the high-level policy and does not train the rest.

@donamin - That should be possible, but it's important to have multiple agents, so that each agent can train with its own policy. The high-level agent's actions would then provide values for the low-level agent's obervations.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MarkTension picture MarkTension  路  3Comments

dlindmark picture dlindmark  路  3Comments

GuntherFox picture GuntherFox  路  3Comments

DVonk picture DVonk  路  3Comments

MarcPilgaard picture MarcPilgaard  路  3Comments