Gym: How to set up trained RL agent to play against human player?

Created on 2 Oct 2019 · 13Comments · Source: openai/gym

Hi, taking Pong_v0 as example, there are plenty of examples to train RL agent to play the game vs the game bot, I also found out that play.py is a script that allows human player to play against the game bot. However, I am wondering, is it possible to replace the game bot with the trained RL agent against a human player? After I have obtained all of the optimal hyperparameters and optimal weightages for my neural network, what should I do next?
In short, I have a trained RL bots, and I wish I could play against it.

Source

nfkok

Most helpful comment

The atari gym environments may not have the multiplayer setup. The gym retro environments should have it though: https://github.com/openai/retro When you instantiate the RetroEnv instance you can specify the number of players: https://retro.readthedocs.io/en/latest/python.html#retro.RetroEnv

import retro
env = retro.make('Pong-Atari2600', state='Start.2P', players=2)
obs = env.reset()

The action space can be a little confusing, you'll have to figure out how it maps which keys to which players, but it should be doable.

In general though, this is a sort of per-environment capability. Most environments are not 2 player, and 2 player environments may or may not support a single player mode or playing against humans.

christopherhesse on 12 Oct 2019

👍3

All 13 comments

https://github.com/koulanurag/ma-gym
This is a Multi-Agent API for Gym. Have a look at the wiki,
https://github.com/koulanurag/ma-gym/wiki/Usage#customizing-an-environment. Give it a try.

Zilch123 on 3 Oct 2019

Hi, thank you, seems really useful for me, but after I have read through the scripts and documentation, I have come up with some questions.

Previously I referred to Kaparthy's git code, he preprocessed 210x160x3 pixels into 80x80 1D array for neural network input; for the multi-agent Pong environment by Koulanurag, how can I do the preprocess of frames into the same 80x80=6400 input nodes for the input layer? (since I have the weightages ready, I hope I can directly play the game using the trained RL)
Correct me if I am wrong, the multi-agent environment does not have single player mode, and thus I have to train 2 RL agents to play against each other?
Thank you again.

nfkok on 11 Oct 2019

import retro
env = retro.make('Pong-Atari2600', state='Start.2P', players=2)
obs = env.reset()

The action space can be a little confusing, you'll have to figure out how it maps which keys to which players, but it should be doable.

In general though, this is a sort of per-environment capability. Most environments are not 2 player, and 2 player environments may or may not support a single player mode or playing against humans.

christopherhesse on 12 Oct 2019

👍3

Thanks. I have figured out which bit in the MultiBinary in the action space maps to which player in gym retro, but now my problem is how to get the keyboard input. In the atari gym environment, there is a function get_keys_to_action (according to my understanding based on play.py script), but there is no such function or API in gym retro. Do you have any suggestion on how can I get the player's input from keyboard?

nfkok on 15 Oct 2019

Hey, @nfkok, I am working on something very simillar to you on the PongNoFrameskip-v4 environment - trying to play as the brown paddle against my DQN trained bot on the green paddle - and stumbled upon this issue. Did you have any luck with this approach? Any advice would be much appreciated! :)

tioans on 26 Nov 2019

@epiicme I used pygame to map the action space to the keyboard in retro Pong-Atari2600. I migrated everything from gym to retro.
But now I have another problem, the agent that I trained is not improving even after 3000 episodes in retro. So even if I have successfully integrated the pygame control into the game, my agent is not trained. In gym Pong-v0, the agent played very well during inferencing (without backprop anymore) after a day of training, but it does not work in retro Pong-Atari2600.
May I know did you wrote your own script for DQN or using tensorflow? Can you try to train your agent in retro's Pong-Atari2600 and tell me whether the agent is learning or not?
I hope we can discuss more on this, thanks.

nfkok on 26 Nov 2019

@nfkok, thanks for the info. The DQN script I wrote is based on a book called "Deep Reinforcement Learning Hands-On" by Maxim Lapan, and it's completely in PyTorch.
That's a good, idea, assuming the conversion from gym to retro isn't too much work. I'll try to see if it can train just as well on Pong-Atari2600. Could you let me know how difficult it was to change libraries?

tioans on 26 Nov 2019

@epiicme migrating to retro is almost the same, just the action space is different which is in index and not in discrete as in gym and retro only supports python3. You can try to prepare 2 environments, one for training by setting single player, and then inference it in another 2-player environment with built-in PYGAME keyboard input. Quite straight forward.
During training, what I did in retro Pong was actually implementing the same thing as Andrej Karphaty did, same pre-processing, policy forward, rmsprop and etc but it just does not work. Not sure what's gone wrong.

nfkok on 27 Nov 2019

@nfkok, ok thanks for the update. I'll look into setting up a training environment, and if that imporves I'll let you know.
Strange that your network doesn't work though. It sounds like it should be working well.

tioans on 28 Nov 2019

Hello, @nfkok I'm working on something very similar and currently trying to map keyboard inputs to the action space to control one of the paddles in the game. I was wondering how u went about it and also were you successful in playing against the trained AI?

AndreiCBogdan on 4 Feb 2020

Hello, @nfkok I'm working on something very similar and currently trying to map keyboard inputs to the action space to control one of the paddles in the game. I was wondering how u went about it and also were you successful in playing against the trained AI?

Hi. In the end I did not use the gym or gym-retro environments. Instead, I wrote my own Pong game using Pygame and train it using the policy gradient framework by Andrej Karphaty (by replacing the gym environment with my own Pong game, with same pixels number (80x80) ). I make each episode to have 11 games, and the computer player will follow the y-coordinate of the ball at 75% chances. The training is not too ideal, but the RL agent still manages to reach running mean of -5.5 (winning 5.5 games per episode) after a week of training at learning rate of 1.5e-3.

I tried to replace the computer player with the Pygame keyboard input and it is playable. I win most of the times, but the agent is not too bad either.

As for now I am still trying to train with different parameters and game conditions to try to improve it.

nfkok on 12 Feb 2020

Although gym retro supports multi agent, something seems not quite right. My agent still lost every game and tends to stay still in the bottom after over 20,000 episodes of training (from several attempts which has cost me few weeks :P). I tried to check pixel by pixel and the reward mechanism but still could not find where is the problem. So in the end I decided to write my own Pong game.

nfkok on 12 Feb 2020

Thank you for the info, and it seems strange that even after 20,000 episodes the agent isn't playing optimally.
Would it be possible to get some details on the keyboard input, I'm a bit confused as to how to implement it. I've tried using pygame but when I change the action space of the human player(action space shared by both player) by reading in the keyboard input and changing the specific index to a 1, my paddle either goes down really fast(or up). There's no incremental change in position, the paddle shoots off. Maybe i'm updating the action_space at the wrong time - I'm unsure.
Thank you

AndreiCBogdan on 10 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings