Ray: TwoStepGame.py doesn't work for contrib/MADDPG [rllib]

Created on 23 Jan 2020 · 6Comments · Source: ray-project/ray

Reproduction

Running https://github.com/ray-project/ray/blob/master/rllib/examples/twostep_game.py with the option contrib/MADDPG for --run gives this error:

File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__ Trainer.__init__(self, config, env, logger_creator) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 398, in __init__ Trainable.__init__(self, config, logger_creator) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 96, in __init__ self._setup(copy.deepcopy(self.config)) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 523, in _setup self._init(self.config, self.env_creator) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 109, in _init self.config["num_workers"]) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 568, in _make_workers logdir=self.logdir) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py", line 64, in __init__ RolloutWorker, env_creator, policy, 0, self._local_config) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/evaluation/worker_set.py", line 220, in _make_worker _fake_sampler=config.get("_fake_sampler", False)) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 350, in __init__ self._build_policy_map(policy_dict, policy_config) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 766, in _build_policy_map policy_map[name] = cls(obs_space, act_space, merged_conf) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/contrib/maddpg/maddpg_policy.py", line 158, in __init__ scope="actor")) File "/afs/ece.cmu.edu/usr/charlieh/.local/lib/python3.6/site-packages/ray/rllib/contrib/maddpg/maddpg_policy.py", line 368, in _build_actor_network sampler = tfp.distributions.RelaxedOneHotCategorical( AttributeError: 'NoneType' object has no attribute 'distributions'
Is MADDPG not supposed to work for discrete observation spaces? I've also tried it on my own environment from #6884 (which uses a Tuple observation space) and it complains that the state space isn't valid.

bug rllib stale

Source

houcharlie

All 6 comments

Hey, I'm the maintainer of the MADDPG example code and I've used the implementation a fair bit. My first guess would be trying installing tensorflow-probability==0.7.0 and seeing if that fixes your error.

justinkterry on 24 Jan 2020

👍1

If that doesn't work, please do pip freeze and post the results

justinkterry on 24 Jan 2020

@justinkterry Hello! Does MADDPG support MultiDiscrete action now?

King-Of-Knights on 2 Jun 2020

No. The MADDPG implementation here is kinda cursed.

On Mon, Jun 1, 2020 at 9:31 PM Tuya notifications@github.com wrote:

@justinkterry https://github.com/justinkterry Hello! Does MADDPG
support MultiDiscrete action now?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/6895#issuecomment-637214451,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEUF33FLYDZM5ZZ6TAOLUKLRURI63ANCNFSM4KKQI2ZA
.

>

Thank you for your time,
Justin Terry

justinkterry on 2 Jun 2020

👍1

@justinkterry I just modified the code and it should support MultiDiscrete Action now.
To be simple, the code I added in maddpg_policy.py is:

sampler= tf.concat([tfp.distributions.RelaxedOneHotCategorical(
                            temperature=1.0, logits=feature).sample() for feature in
                                    tf.split(feature, self.nvec, axis=1)], axis=1)

Now, the sum of actor network output will be N if agent has N MultiDiscrete Action space

The whole code can be found here

King-Of-Knights on 2 Jun 2020

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.