Hi,
I wanted to know more about what the timesteps_per_iteration parameter means in context of DQN based agents? Quick glance at the code: https://github.com/ray-project/ray/blob/cff08e19ff1606ef6e718624703e8e0da19b223d/python/ray/rllib/agents/dqn/dqn.py#L257-L261
suggests that you optimize for timesteps_per_iteration after each env step? That doesn't quite seem right since the default value for timesteps_per_iteration is 1000 which seems high .
Also, does timesteps_per_iteration have a different connotation in case of distributed agents like ApeX?
This is a system-level parameter, and only affects the (system) performance of your agent. For example, if you set this value too small, then too many result logs may be produced, slowing down training.
The hyperparameter you're probably thinking of is train_batch_size and sample_batch_size: https://ray.readthedocs.io/en/latest/rllib-training.html#specifying-resources, which determine how many steps to sample from the env, and then how many steps to train on from the replay buffer.
Hope that makes sense.
Most helpful comment
This is a system-level parameter, and only affects the (system) performance of your agent. For example, if you set this value too small, then too many result logs may be produced, slowing down training.
The hyperparameter you're probably thinking of is train_batch_size and sample_batch_size: https://ray.readthedocs.io/en/latest/rllib-training.html#specifying-resources, which determine how many steps to sample from the env, and then how many steps to train on from the replay buffer.
Hope that makes sense.