Stable-baselines: Tuple action space with stable baselines PPO2 [question]

Created on 1 Dec 2018 · 3Comments · Source: hill-a/stable-baselines

Hi,

I am trying to train a controller using PPO2 algorithm. The action space for my problem consists of 2 continuous and one discrete action. I tried using a tuple action space (similar to examples on gym website), but PPO2 (I also tried TRPO) throws a not implemented error. I tried a workaround: I defined the action space as Box with 3 actions and before stepping the environment, I check if the value is below a threshold value, I change the action value to 0, else 1. But this simplification is making it hard for the controller to learn the task. Is there a way to use tuple action spaces, or do you have ideas from similar problems?

enhancement help wanted

Source

sahilgupta2105

All 3 comments

Hello,

Tuple action space is currently not supported, but I recommend you to read @hill-a comment on that issue https://github.com/hill-a/stable-baselines/issues/100#issuecomment-442004538

Support is not currently planned but we are open to PR ;)

araffin on 1 Dec 2018

👍1

Hi,

Thanks for the prompt reply. I saw the comment. So, I am guessing just implementing a probability distribution for a tuple space will suffice. I will update you if I am able to successfully implement it.

sahilgupta2105 on 1 Dec 2018

👍1

Closing in favor of #133 to avoid duplicated issues ;)

araffin on 12 Feb 2019

Was this page helpful?

0 / 5 - 0 ratings