Ray: [rllib] MADDPG no exploration options applied?

Created on 6 Feb 2020 · 11Comments · Source: ray-project/ray

Is no exploration (parameter noise, ou exploration) possible/applied to chosen action values as in DDPG in the current version of contrib/MADDPG?

Also, would saving the trained model and restoring it be possible, to subsequently use compute_action methods to query the trained MADDPG model?

question rllib stale

Source

lennardsnoeks

Most helpful comment

Hey guys, my schedule just freed up. I'll fix the action space issue and
add exploration myself in the next few days.

On Wed, Feb 12, 2020 at 2:27 PM Justin Terry notifications@github.com
wrote:

The specifics of the bug in terms of action spaces is still unclear
(dealing with it is on my list), but I don't think it should keep it from
learning at all?

—
You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/7069?email_source=notifications&email_token=AEUF33EQXNNVSUIDGODPBK3RCRESVA5CNFSM4KQVYXE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELSCDAQ#issuecomment-585376130,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEUF33B3YVUHNSCYCSC52IDRCRESVANCNFSM4KQVYXEQ
.

>

Thank you for your time,
Justin Terry

justinkterry on 12 Feb 2020

👍3

All 11 comments

Hey, I read through the implementation and believe that you are right, there is no noise added.

I tried to apply MADDPG to a MultiAgentEnv, but no learning occurs. See #6949 .

Shall we implement it?

We can probably copy it over from how action noise is added on top of the policy output in the DDPG implementation here.

dissendahl on 12 Feb 2020

There will be a unified way of adding any exploration type to any agent (ready for ActionNoise in a few weeks) as well as to switch this on/off on a per-call basis. We have done this for EpsilonGreedy already (default for DQN). We are currently working on a (related) deterministic flag, available for controlling action sampling in compute_actions and will do ActionNoise (gaussian and OU) very soon.
In the meantime, yes, you could transfer parts of the DDPG code into MADDPG to make this work.

sven1977 on 12 Feb 2020

I tried to apply MADDPG to a MultiAgentEnv, but no learning occurs. See #6949 .

I talked to @justinkterry and there is currently still a bug concerning the use of different action spaces. The implementation is made for the openai multiagent particle environments, so using MADDPG on a MultiAgentEnv may cause trouble.

lennardsnoeks on 12 Feb 2020

The specifics of the bug in terms of action spaces is still unclear (dealing with it is on my list), but I don't think it should keep it from learning at all?

justinkterry on 12 Feb 2020

Hey guys, my schedule just freed up. I'll fix the action space issue and
add exploration myself in the next few days.

On Wed, Feb 12, 2020 at 2:27 PM Justin Terry notifications@github.com
wrote:

The specifics of the bug in terms of action spaces is still unclear
(dealing with it is on my list), but I don't think it should keep it from
learning at all?

—
You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/7069?email_source=notifications&email_token=AEUF33EQXNNVSUIDGODPBK3RCRESVA5CNFSM4KQVYXE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELSCDAQ#issuecomment-585376130,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEUF33B3YVUHNSCYCSC52IDRCRESVANCNFSM4KQVYXEQ
.

>

Thank you for your time,
Justin Terry

justinkterry on 12 Feb 2020

👍3

Hey @justinkterry , that's awesome. Could you look at this PR, though?
https://github.com/ray-project/ray/pull/7155
It'll be merged probably today and unifies the way we do exploration/stochasticity/deterministic action draws.
I know, MADDPG doesn't really abide well to our Policy hierarchy (yet), so if you think it's better to fix this now in the existing MADDPG structure (without the new Exploration API), please go ahead.

sven1977 on 19 Feb 2020

I think you're right that it would be better to make MADDPG apply to this hierarchy instead. Plain DDPG is using that hiercharchy right?

Also, does that PR include parameter noise (https://openai.com/blog/better-exploration-with-parameter-noise/)?

Also, this really is on my list to work on by the way, I've just been a little slammed recently.

justinkterry on 21 Feb 2020

@sven1977 : I just tried out using MADDPG with the exploration API as of commit 2d97650b1e01c299eda8d973c3b7792b3ac85307 0.9.0dev which works fine for me. Thanks for your efforts creating a unified exploration API!

@justinkterry : Unfortunately the "action space issue" you mentioned is still unsolved.
On the problem I am working on, I am not able to make MADDPG learn anything, where DDPG or PPO both do. The action output of MADDPG is always 1 (action space is 1D Box [0,1]) for all agents.
Since I tried out different models on the environment, I am pretty confident the problem resides in the MADDPG implementation. Do you have an idea, how we can inspect this better?
Also are there any flags (beside "log_level") that enable me inspecting the action outputs or weight adjustments? What's the best approach to inspect what happens within the networks on training?

dissendahl on 5 Mar 2020

Any progress made on this @justinkterry? I would like to compare MADDPG to a parameter sharing approach.

lennardsnoeks on 15 Mar 2020

@dissendahl and I have been working on something. It will hopefully be out in the near future.

justinkterry on 15 Mar 2020

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.