Ml-agents: AgentAction sometimes ignore masked actions and cause invalid agent actions.

Created on 21 Oct 2019 · 9Comments · Source: Unity-Technologies/ml-agents

Describe the bug
"Sometimes", AgentAction ignores masked actions while using Discrete Vector Space and on demand Decision Requests which causes invalid agent actions and corrupts learning process. I noticed this in my own project and tried to replicate it in original Unity examples; I was able to modify "Basic" example to replicate this problem there too.

To Reproduce
Steps to reproduce the behavior:

Open ML-Agents > Examples > Basic > Scenes >Basic
Edit Basic/Scripts/BasicAgent.cs and add the following inside CollectObservations function after AddVectorObs(m_Position, 20);:

        var maskList =new List<int>();
        for(int i = 3; i < 210;++i) {
            maskList.Add(i);
        }
        SetActionMask(0, maskList);

        var dummyObsList = new List<float>();
        for (int i = 0; i < 52; ++i) {
            dummyObsList.Add(i);
        }
        AddVectorObs(dummyObsList);

Add the following to AgentAction function, after var movement = (int)vectorAction[0];:
Debug.Assert(movement < 3, "Action was called eventhough it was masked."+movement);
Save the file.
Change brain parameters (BasicLearning):
Vector Size: 72
Branch 0 Size: 210
Change Academy's Training Configuration:
Quality: 0
Target Frame Rate: -1
Change config.yaml:
Rename BasicLearning to BasicLearningIgnored so the following modified default one can be used:

    trainer: ppo
    batch_size: 10
    beta: 5.0e-3
    buffer_size: 100
    epsilon: 0.2
    hidden_units: 128
    lambd: 0.95
    learning_rate: 3.0e-4
    learning_rate_schedule: linear
    max_steps: 15.0e4
    memory_size: 256
    normalize: false
    num_epoch: 3
    num_layers: 2
    time_horizon: 64
    sequence_length: 64
    summary_freq: 1000
    use_recurrent: false
    vis_encode_type: simple
    reward_signals:
        extrinsic:
            strength: 1.0
            gamma: 0.99

Run ml-agents: mlagents-learn config/config.yaml --run-id=basicx3 --train

Console logs / stack traces
You will *eventually see the Assertion Line you added in BasicAgent.cs and that must not normally happen as all those actions must have been masked.

If you don't see it first, re-run and you should see it. Somehow masked actions are still called even if this happens rarely. Even if it is rare, it is enough to corrupt the learning process and I am always able to replicate this problem.

Environment (please complete the following information):

OS + version: Mac OS Catalina 10.15 (19A602)
ML-Agents version_: 0.10.1
Unity version: 2019.2.9.f1
Tensorflow: 1.15rc2 (also 1.14.0)

UPDATE: Starting from 26 actions ( Branch size: 26 ), this problem occurs. No need to above extra dummy observation lines, etc. I did not replace them as they may still give ideas. The step 2 above can be replaced with the following, other steps except the branch size and reverted observation space size(20), remain the same:

 public override void CollectObservations() {
        AddVectorObs(m_Position, 20);
        SetActionMask(0, new int[] { 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 });
    }

I also debugged at that moment to check ActionMasker's current state and I perfectly see the masked values there. So, somehow, AgentAction is still called with an action that is clearly marked as masked. I tried with lower branch sizes and it does not "seem like" there will be a problem, though I cannot guarantee, but over >= 26 I always get the assertion at a certain point.

bug

Source

CharlieReece

All 9 comments

Hi @jacobson,
Sorry for the delay on this - I'm going to look into it in more detail tomorrow.

chriselion on 23 Oct 2019

Hi,
I was able to reproduce this behavior with the steps you provided. Using a branch size of 210, this happened 6 times in the 150000 training steps.

I think the cause of this is an epsilon that we add to the probabilities before taking the log here: https://github.com/Unity-Technologies/ml-agents/blob/1a1919a2cff14582da06c89df7eccd14673b78d9/ml-agents/mlagents/trainers/models.py#L445-L449
(to avoid taking log(0)).

I don't have a good workaround at the moment, but I'll see if we can get a simple fix for this. I've also logged it in our internal tracker as MLA-218.

chriselion on 23 Oct 2019

@chriselion Thank you. I checked it and yes it affects the stability.

I changed EPSILON from 1e-7 to 1e-10 and it seems like it is working: AgentAction was NOT called with an invalid action during 630.000 steps (210 total actions). (I stopped it after 630.000 steps). (1e-10 also worked for 99.999 total actions and I stopped it after 42.000 steps.)

I understand that 1e-7 was defined as EPSILON thinking about 32bit in mind; but as I tested, EPSILON definition in models.py is already working as 64bit (np.finfo(EPSILON).eps) by default (on most modern computers) as it was not defined specifically as float32, unlike many places in the codebase. On top of that, I see that 1e-10 is already used in models.py (inside create_learning_rate function), so it looks like it is not banned. For now, it looks like this is an acceptable workaround? (Probably even 2.220446049250313e-16 can be used as EPSILON "for now")

What do you think?

CharlieReece on 24 Oct 2019

👍1

@jacobson Glad the workaround works. Using any sort of epsilon here still feels bad because there's a chance it can pick an invalid action; we're looking at rewriting this using tfp.distributions.Multinomial which takes probabilities instead of log-probabilities, so zeroing out will be cleaner.

chriselion on 24 Oct 2019

🎉1

(Even 2.220446049250313e-16 does not work good enough after 1.5 million steps. This workaround is no longer helpful enough. I am looking forward to the fix you have mentioned. Thank you for your concern, time and efforts. I do not want to dig into python code as you can modify it anytime, so I really appreciate if that happens to be possible to increase the priority of this bug internally, as it directly affects training with many actions while using discrete vector space type. I am not able to resume the training too with --load under these circumstances.)

CharlieReece on 27 Oct 2019

Hi @jacobson,
I'm not sure about the current prioritization of the bug, but I'll try to get someone to look into it this week.

In the meantime, I'd suggest you add a check for masked actions before taking them, and add some sort of fallback logic in the case it does get selected.

chriselion on 28 Oct 2019

First of all, thank you very much for considering at least the possibility of the prioritization.

I especially have not done that to not to make the statistics of the brain more complex: I want to provide 100% useful result to an incoming decision. I could add a DoNothing method for invalid actions (which should not pass the turn of my board game, but re-request another decision), but then it would learn that those decision would cause a (VALID) DoNothing result where as it should never ever consider making a decision that would cause DoNothing. Yes those may become insignificant with after many decisions but it would still pollute the brain. I understand that this is perfectly usable for various situations; but in my case, I want to provide the correct pairs (action-result) to the brain. If my assumptions are not correct please correct me. I welcome any suggestions and I would be grateful to read about your experience about similar things. Thank you!

CharlieReece on 29 Oct 2019

Hi @CharlieReece,
Sorry for the long delay on this. This bug is still open but we hope to get it fixed soon.

As for what to do when an invalid action, I would recommend:

Randomly selecting one of the remaining actions during training.
Either re-requesting or randomly selecting during inference.

chriselion on 7 Mar 2020

Thank you for the update. To be honest, I have stopped using ML agents right after this problem; because:

ML Agents has been going under major changes and it did not seem like a good idea to train the model and use it in production while the traffic on the changes are so high.
I did not want to introduce "any" random data "manually" to the whole decision process and in the version I have worked, it was not that much possible to re-request the decision in the automated process where agent had already made a decision 1 time and needs the consequence, no turning back. I am not aware of the "actual" results of the latest changes in the codebase right at this moment; I only track changelogs announced by you to keep an eye on what is going on in general.

I have not given up using ML Agents at all, but I postponed the use of it (in my current project) until its the first major release or the release candidate version. I want to start using it again in non-production environments and test projects, for adapting to the latest changes, as soon as I have time.

In the mean time, I had to go back and make the AI in my project the old-school way; hopefully to be replaced or updated with ML Agents.

Thank you for your concern, time, sharing ML Agents with us and letting us be a part of it too.

CharlieReece on 7 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings