Stable-baselines: Exploration in DQN: What is the exploration_fraction?

Created on 9 Dec 2019  路  2Comments  路  Source: hill-a/stable-baselines

From what I understand from the original DQN paper, an epsilon-greedy strategy is used for exploration in DQN. So, the random action probability is annealed from a high value (1.0) to a lower value (0.02, or whatever) over the entire training period.

Where does the exploration_fraction tie into this?

question

Most helpful comment

This is explained in the docs. Essentially: Epsilon value starts from exploration_initial_eps and linearly drops to exploration_end_eps when exploration_fraction of training is done (e.g. if exploration_fraction = 0.5, epsilon will reach exploration_end_eps half-way through training).

All 2 comments

This is explained in the docs. Essentially: Epsilon value starts from exploration_initial_eps and linearly drops to exploration_end_eps when exploration_fraction of training is done (e.g. if exploration_fraction = 0.5, epsilon will reach exploration_end_eps half-way through training).

I understand, thanks for the clarification!

Was this page helpful?
0 / 5 - 0 ratings