Ml-agents: Modify agent training after it's finished via custom interface

Created on 9 Apr 2018  路  11Comments  路  Source: Unity-Technologies/ml-agents

Hello there,

This is more a discussion than a real issue, and is a follow-up to this discussion:
https://github.com/Unity-Technologies/ml-agents/issues/565
My question is the following: Is there a way to, once the training is completed, modify it?
My goal is to design an interface to modify training afterwards by modifying the rewards/whatever modify the trainings in the right direction. As stated in the previous issue, it's not possible from the game in itself (c#), so the only solution would be to create a python overlay/interface of some sort?
Thanks for contributing in the discussion! ;)

discussion

Most helpful comment

Different reward signals would require further training. The agent would need to explore the whole matter again. Also, rewards are only necessary for the training of the agent. Once put to inference, the agent does not utilize any rewards from the environment anymore.

All 11 comments

Hi @Tlospock
here are my thoughts on your question. It shouldn't be an issue to continue updating your model. However, if you modify your state or action space, the model is not compatible anymore. Well, there are possibilities to reuse layers from existing models. That goes rather in the direction of transfer learning. Adjusting rewards should be alright. On the python side, I don't think a UI makes sense, because rewards are signaled by the environment. So it would be something which you can add to your Unity environment. Though, it has to be kept in mind that the Unity application is occasionally unresponsive. I can only think of the agent's exploration which could be made adjustable during training.

Hello @MarcoMeter, thanks for the quick answer.
My goal here is to try to make the AI decisions modifiable after training, to do the following.
The AI is trained to do a particular goal, and the training is done. But the game designer is not happy enough with it and wish to modify some of the decision making, so he go through an action in the game context and at a certain point stop the game execution and give a better reward for another action. My goal is to make a tool rather than train a particular AI.
I'm sorry if I was not clear enough!

Different reward signals would require further training. The agent would need to explore the whole matter again. Also, rewards are only necessary for the training of the agent. Once put to inference, the agent does not utilize any rewards from the environment anymore.

Oh, I see. So do you think it's possible to modify the decision making, for example by (imitation learning) when the training is done? Or is there no solution whatsoever for a human trainer to modify the training once finished whithout redoing it from the beginning?
Again, thanks a lot for your advices!

So do you think it's possible to modify the decision making, for example by (imitation learning) when the training is done?

It would be the other way around. You can start by imitation learning to get a basic behavior done, which can than be refined by reinforcement learning. This would be just an aid to reinforcement learning.

It always depends on the behavior to achieve. If it is about varying difficulty levels, then there are some potential approaches. But if you want to majorly alter the agent's behavior so that it does things completely differently, it would always rely on what is being trained for. An idea could be to introduce some further properties to the state space, which can be modified to tweak the behavior. Certainly this would make the training more complex.

Ok. My goal, though, is to build the tool/interface in itself, which can modify/refine training which is already done, and do some experiments on it. So I suppose I will try to search for workarrounds as the one you proposed.
Thanks a lot for your help!

I recommend you to think about some use-cases. It might be easier to start out on specific problems.

Yes that's what I'm doing right now. I built a quick prototype which for now works with a simple AI (which should be replaced by a ml agent) which get the closest object and go to it. The user/designer can "pause" the execution and select another target to go to:
2018-04-09 13-40-37

My research idea is to apply that to a ML_Agent, but my background in ML is quite unstable, that's why I wanted to rely on Unity's ml-agent.
I'm currently thinking of more relevant situations, but I think that finding out if it is feasible is more important at the beginning.

The user/designer can "pause" the execution and select another target to go to:

At least you can train it right from the start for multiple targets by randomizing the target's position for each episode. Then on a "higher level" you could just add the position of the desired target to the state space.

I see what you mean, but this is just a simple prototype. The goal at long term si to choose between several types of objects: if it's an rpg, it may be an ennemy, or a chest, or anything which is of interest and is in the action space of the agent. So that may be complicated. I will try to make some progress on the prototype by changing some of the colectibles to simulate different kind of objects. And then, train the agent by randomizing target position.
But in the end the question stay the same: how to modify the results of training, and get a behaviour which is not the most optimized for rewards but is better for the gameplay?
Right now, and based on our discussion, I may have several courses of action:
->Do the training in two phase, without change: First let the Agent train alone, and then, during the second phase, do some imitation learning based on a human trainer inputs, without changing the action space.
->Do the opposite, first do imitation learning, which would give rough direction for the agent to follow and then refine training by reinforcement learning (or another automatic learning process)
->Try to continue this way and make the Unity gui interact with the python learning side during learning, I can already pause the training with the minimal interface I did, and click the gui buttons, just have to find a way to send to signal to python (if it can be done)
->

train it right from the start for multiple targets by randomizing the target's position for each episode. Then on a "higher level" you could just add the position of the desired target to the state space.

The difficulty for me in each case will be to understand and modify machine learning code.
We can continue to discuss this by mail if it's easier for you :)

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Porigon45 picture Porigon45  路  3Comments

RavenLeeANU picture RavenLeeANU  路  4Comments

Rodnyy picture Rodnyy  路  3Comments

DVonk picture DVonk  路  3Comments

DavidLining picture DavidLining  路  3Comments