In many gym environments, like MountainCarContinuous, there is an epsiode step limit. This leads to episode termination before actually achieving the end of trajectory(which in this case is reaching uphill).
Saving these experiences to buffer without changing artificial terminals to False, for example, in here, leads to an error in computing TD errors. I think the agent's prediction about the future rewards while it has not reached the real end of the trajectory yet, should be taken into account.
This is why some implementations like OpenAI SpinningUp change that terminal states before saving the experience, like this:
"""From OpanAI SpinningUp source code"""
# Ignore the "done" signal if it comes from hitting the time
# horizon (that is when it's an artificial terminal signal
# that isn't based on the agent's state)
d = False if ep_len==max_ep_len else d
# Store experience to replay buffer
replay_buffer.store(o, a, r, o2, d)
Hello,
thanks for pointing out that problem.
So you have different way of dealing with the problem. One easy way is to add a time feature, as it done in the zoo:
Actually, the right way would be to check for TimeLimit.truncated in the info:
https://github.com/openai/gym/blob/master/gym/wrappers/time_limit.py#L19
it is a recent gym feature.
So if I get it right, this is the right way to filter out artificial terminal flags:
done = False if info['TimeLimit.truncated'] else done
Am I right?
And do you have any plan to add this to stable-baselines or stable-baselines3?
Am I right?
Looks good ;)
And do you have any plan to add this to stable-baselines or stable-baselines3?
not for now, as the time feature is sufficient and avoid including additional complexity in the code (it gets a little more complex when using multiple environments).
I created a branch on SB3 but it in fact a bit more tricky than expected (notably because VecEnv resets automatically): https://github.com/DLR-RM/stable-baselines3/compare/feat/remove-timelimit
For A2C/PPO or any n-step methods, we would need to keep track of two types of terminations signal...