Stable-baselines: Exploration in DQN: What is the exploration_fraction?

Created on 9 Dec 2019 · 2Comments · Source: hill-a/stable-baselines

From what I understand from the original DQN paper, an epsilon-greedy strategy is used for exploration in DQN. So, the random action probability is annealed from a high value (1.0) to a lower value (0.02, or whatever) over the entire training period.

Where does the exploration_fraction tie into this?

question

Source

prabhatverma286

Most helpful comment

This is explained in the docs. Essentially: Epsilon value starts from exploration_initial_eps and linearly drops to exploration_end_eps when exploration_fraction of training is done (e.g. if exploration_fraction = 0.5, epsilon will reach exploration_end_eps half-way through training).

Miffyli on 9 Dec 2019

👍2

All 2 comments

This is explained in the docs. Essentially: Epsilon value starts from exploration_initial_eps and linearly drops to exploration_end_eps when exploration_fraction of training is done (e.g. if exploration_fraction = 0.5, epsilon will reach exploration_end_eps half-way through training).

Miffyli on 9 Dec 2019

👍2

I understand, thanks for the clarification!

prabhatverma286 on 9 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[question] Tensorboard callback during testing/predicting?

stefanbschneider · 3Comments

[question] What does .action_probability mean for continuous spaces?

shwang · 3Comments

Trying to understand hardware limitations for parallelizing PPO2 [question]

SerialIterator · 3Comments

What is the default network architecture for MlpLnLstmPolicy?

ktattan · 3Comments

How can i get the parameters of the trained policy

HareshKarnan · 3Comments