I have a simple question, the answer for which that I could not find in the documentation. I am not familiar with using iterations in reinforcement learning, as I've previously seen that the train step ends when the episode is done. In the rllib case, it looks like the train step continues until the iteration is done? I am not clear on how iterations are used.
Also, when is the network updated; after an episode or after an iteration?
Ray version and other system information (Python version, TensorFlow version, OS):
Iterations are just for metrics reporting. They have no connection with episodes; many episodes are executing concurrently and may span one or more iterations. Similarly, there is no connection with SGD updates, those are executed over batches of train_batch_size depending on the algorithm.
You can check num_steps_sampled, episodes_total, num_steps_trained metrics to see what is happening.
Most helpful comment
Iterations are just for metrics reporting. They have no connection with episodes; many episodes are executing concurrently and may span one or more iterations. Similarly, there is no connection with SGD updates, those are executed over batches of
train_batch_sizedepending on the algorithm.You can check
num_steps_sampled,episodes_total,num_steps_trainedmetrics to see what is happening.