From what I understand from the original DQN paper, an epsilon-greedy strategy is used for exploration in DQN. So, the random action probability is annealed from a high value (1.0) to a lower value (0.02, or whatever) over the entire training period.
Where does the exploration_fraction tie into this?
This is explained in the docs. Essentially: Epsilon value starts from exploration_initial_eps and linearly drops to exploration_end_eps when exploration_fraction of training is done (e.g. if exploration_fraction = 0.5, epsilon will reach exploration_end_eps half-way through training).
I understand, thanks for the clarification!
Most helpful comment
This is explained in the docs. Essentially: Epsilon value starts from
exploration_initial_epsand linearly drops toexploration_end_epswhenexploration_fractionof training is done (e.g. ifexploration_fraction = 0.5, epsilon will reachexploration_end_epshalf-way through training).