Stable-baselines: [Question] How best to implement self-play/multiple agents in the same environment?

Created on 31 Jan 2019  路  4Comments  路  Source: hill-a/stable-baselines

I'm trying to train a model using self play, and really love the work that has been done here so far. I was wondering whether anyone might have some advice about how I might adapt PPO2 to allow for multiple models to play against each other in the same environment.

The overall strategy would be to:

  • Store N models in a list
  • Generate an action from each of these models using a single observation
  • Generate a list of rewards for each of these actions from an environment
  • Update the models based on these rewards

I have written a custom environment that can take an array of actions, update the game state, and then return a list of rewards for each agent. My main issue is in prying apart the actual model from the interactions with the gym environment. I have been trying to decouple the model from the runner, but it seems as if they are quite tightly intertwined and I'm having a difficult time. Has anyone else played around with this idea before? Or be able to point me in the right direction?

question

Most helpful comment

All 4 comments

Hello,

I think @AdamGleave tackled that problem in the Adversarial policies repo, you should take a look ;)

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

@AdamGleave I can't access the page. Is there still an available/public version of it?

Was this page helpful?
0 / 5 - 0 ratings