If the monitor is running you can only reset if the env.step() returns done as True. So even if the MAX_STEPS has been reached and the for loop exits you cannot call env.reset() as the monitor will throw an error.
Even the example code breaks if it doesn't return done
import gym
MAX_STEPS = 20
env = gym.make('CartPole-v0')
env.monitor.start('/tmp/cartpole-experiment-1', force=True)
for i_episode in range(200):
observation = env.reset()
for t in range(MAX_STEPS):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.monitor.close()
Please can we have a fix for this issue ? If we dont reset after MAX STEPS has been reached then it will continue a new episode in the the previous environment
The example code is wrong -- it's invalid to use a monitor when ending episodes early. The purpose of the monitor is to generate accurate performance statistics, which would be wrong if the environment is externally reset.
Best to replace for t in range(MAX_STEPS): with while True:
@tlbtlbtlb If I use while True: there is a chance that the simulation will never end. It could easily happen in Bipedal Walker v2 where the legs get stuck and it dosent move forward. So then the simulation will never end.
Also the thing is I solve these stuff using Genetic Algorithms. So for me 1 episode will have 100 creatures and I have to loop through each creature and reset after the creature has died or has reached the max steps
@DollarAkshay
env.spec.timestep_limit.if done: break@olegklimov Ahhh now I see. Thanks for that. I guess this issue is solved now. Would be nice if the example code was updated and mentioned that the episode exits on its own if it takes too long :+1:
Thanks :)
I'm still facing this issue. Here's the code I am using:
import gym
from gym import wrappers
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, path)
for i_episode in range(2000):
observation = env.reset()
t = 0
while True:
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
t += 1
if done:
print("Done after {} steps".format(t+1))
break
Anything I should be doing different?
Oh wow. Figured it out, bad indentation.
import gym
from gym import wrappers
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, path)
for i_episode in range(2000):
observation = env.reset()
t = 0
while True:
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
t += 1
if done:
print("Done after {} steps".format(t+1))
break
I'm hitting this error as well. The BipedalWalker sometimes can easily get into a state where it doesn't make the env end, so for that env, I can't use Monitor and can't submit to Gym.
I've tried setting timestep_limit:
env = gym.make('BipedalWalker-v2')
env.spec.timestep_limit = 10
print( env.spec.timestep_limit )
if submit:
env = gym.wrappers.Monitor(env, 'run', force=True)
But it just blasts past those timesteps, not returning done. What's the correct solution here? Can we just let monitor deal with reset being called? Or can I set a timestep limit in another way?
@tjacobs
you can set it with, for example:
env = gym.make('Pong-v0')
env._max_episode_steps = your_value
Sweet, of course, thanks!
Hi, I am getting the same _gym.error.ResetNeeded: Trying to step environment which is currently done. While the monitor is active for CarRacing-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode._ error although I use _while True_ to step through an episode and _force=True_ when I create the env. What else could I try?
Thanks!
@tjacobs @4SkyNet - this doesn't work for me, I can set the variable, but the env doesn't return 'done = True' for when I hit the steps limit.
Seems like it works very well with CartPole but now that I am using AntBulletEnv it only records one trajectory and then quits.
Does anyone have any workaround for that ?
You can manually save the video and set done as True with for/else clause:
import gym
from gym import wrappers
MAX_STEPS = 20
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, directory='/tmp/cartpole-experiment', force=True)
for i_episode in range(200):
observation = env.reset()
for t in range(MAX_STEPS):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
else:
env.stats_recorder.save_complete()
env.stats_recorder.done = True
env.monitor.close()
Most helpful comment
@tjacobs
you can set it with, for example: