Hi,
I am trying to train a controller using PPO2 algorithm. The action space for my problem consists of 2 continuous and one discrete action. I tried using a tuple action space (similar to examples on gym website), but PPO2 (I also tried TRPO) throws a not implemented error. I tried a workaround: I defined the action space as Box with 3 actions and before stepping the environment, I check if the value is below a threshold value, I change the action value to 0, else 1. But this simplification is making it hard for the controller to learn the task. Is there a way to use tuple action spaces, or do you have ideas from similar problems?
Hello,
Tuple action space is currently not supported, but I recommend you to read @hill-a comment on that issue https://github.com/hill-a/stable-baselines/issues/100#issuecomment-442004538
Support is not currently planned but we are open to PR ;)
Hi,
Thanks for the prompt reply. I saw the comment. So, I am guessing just implementing a probability distribution for a tuple space will suffice. I will update you if I am able to successfully implement it.
Closing in favor of #133 to avoid duplicated issues ;)