I like using the Taxi environment for educational purposes but was kind of upset to see Taxi-v1 removed completely and that in Taxi-v2 is considered "done" after 200 steps. Is there any workaround for this?
Hello @wagonhelm . You never said why is this a problem.
It's kind of nice to show that a completely random policy will eventually solve the environment in a varying number of steps. It sometimes will solve in <200 steps using random actions, but not often.
So it sometimes solves problem within 200 steps, right. You can calculate mean score.
That's not really my main concern. I'm guessing there is no way to use Taxi-v1 using the master branch nor a workaround for v2 considering the environment done after 200 steps? Ultimately it would be nice to see v1 in the master branch as it's on the gym website as well.
My work around for this was creating a loop that finishes when reward == 20 rather than when done == True
Most helpful comment
My work around for this was creating a loop that finishes when reward == 20 rather than when done == True