Stable-baselines: DDPG and SAC for discrete action space.

Created on 26 Jul 2019 · 4Comments · Source: hill-a/stable-baselines

[question] Is there any reason why DDPG and SAC don't have the implementation for discrete action space? And will appreciate it there are any suggestions for applying the DDPG with continuous action space on the discrete one. Thanks!

duplicate question

Source

soloist96

Most helpful comment

Hello,
For DDPG, you can already find an answer here: https://github.com/hill-a/stable-baselines/issues/37
For SAC, the implementation with discrete actions is not trivial and it was developed to be used on robots, so with continuous actions. Those are the main reason. Meanwhile, if you want to work with discrete actions, you have plenty of other algorithms that can do that (ACER, PPO, DQN, A2C, ACKTR, ...).

araffin on 26 Jul 2019

👍2

All 4 comments

Hello,
For DDPG, you can already find an answer here: https://github.com/hill-a/stable-baselines/issues/37
For SAC, the implementation with discrete actions is not trivial and it was developed to be used on robots, so with continuous actions. Those are the main reason. Meanwhile, if you want to work with discrete actions, you have plenty of other algorithms that can do that (ACER, PPO, DQN, A2C, ACKTR, ...).

araffin on 26 Jul 2019

👍2

It was more than a year ago. Any news on this topic recently @araffin ?
It would be nice if SAC can take discrete action space input.

cosmir17 on 5 Dec 2020

We have an issue about that in Stable-Baselines3 repo: https://github.com/DLR-RM/stable-baselines3/issues/157

But I would favor QR-DQN first in the contrib repo.

araffin on 5 Dec 2020

Thank you for letting me know @araffin

cosmir17 on 5 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[question] Changing the number of envs for LSTM policies

smorad · 3Comments

"Error: the action space must be a vector" error is not included in the env_checker

saeid93 · 3Comments

[question] Actor-Net with continuous actions: Why does the std not depend on observations?

Antalagor · 3Comments

How can i get the parameters of the trained policy

HareshKarnan · 3Comments

[question] What does .action_probability mean for continuous spaces?

shwang · 3Comments