Ray: [rllib] [tune] Iteration vs. Episode explanation

Created on 15 May 2020 · 1Comment · Source: ray-project/ray

What is your question?

I have a simple question, the answer for which that I could not find in the documentation. I am not familiar with using iterations in reinforcement learning, as I've previously seen that the train step ends when the episode is done. In the rllib case, it looks like the train step continues until the iteration is done? I am not clear on how iterations are used.

Also, when is the network updated; after an episode or after an iteration?

Ray version and other system information (Python version, TensorFlow version, OS):

question

Source

Leonolovich

Most helpful comment

Iterations are just for metrics reporting. They have no connection with episodes; many episodes are executing concurrently and may span one or more iterations. Similarly, there is no connection with SGD updates, those are executed over batches of train_batch_size depending on the algorithm.

You can check num_steps_sampled, episodes_total, num_steps_trained metrics to see what is happening.