Gym: Monitor unable to reset if done is false

Created on 21 Dec 2016 · 14Comments · Source: openai/gym

If the monitor is running you can only reset if the env.step() returns done as True. So even if the MAX_STEPS has been reached and the for loop exits you cannot call env.reset() as the monitor will throw an error.

Even the example code breaks if it doesn't return done

import gym
MAX_STEPS = 20
env = gym.make('CartPole-v0')
env.monitor.start('/tmp/cartpole-experiment-1', force=True)

for i_episode in range(200):
    observation = env.reset()
    for t in range(MAX_STEPS):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.monitor.close()

Please can we have a fix for this issue ? If we dont reset after MAX STEPS has been reached then it will continue a new episode in the the previous environment

Source

DollarAkshay

👍1

Most helpful comment

@tjacobs
you can set it with, for example:

env = gym.make('Pong-v0')
env._max_episode_steps = your_value

4SkyNet on 31 May 2017

👍10 🎉2 👎1

All 14 comments

The example code is wrong -- it's invalid to use a monitor when ending episodes early. The purpose of the monitor is to generate accurate performance statistics, which would be wrong if the environment is externally reset.

Best to replace for t in range(MAX_STEPS): with while True:

tlbtlbtlb on 21 Dec 2016

@tlbtlbtlb If I use while True: there is a chance that the simulation will never end. It could easily happen in Bipedal Walker v2 where the legs get stuck and it dosent move forward. So then the simulation will never end.

DollarAkshay on 22 Dec 2016

Also the thing is I solve these stuff using Genetic Algorithms. So for me 1 episode will have 100 creatures and I have to loop through each creature and reset after the creature has died or has reached the max steps

DollarAkshay on 22 Dec 2016

@DollarAkshay

the monitor respects env.spec.timestep_limit.
you can quit infinite loop using if done: break

olegklimov on 22 Dec 2016

@olegklimov Ahhh now I see. Thanks for that. I guess this issue is solved now. Would be nice if the example code was updated and mentioned that the episode exits on its own if it takes too long :+1:

Thanks :)

DollarAkshay on 22 Dec 2016

I'm still facing this issue. Here's the code I am using:

import gym
from gym import wrappers
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, path)

for i_episode in range(2000):
        observation = env.reset()
        t = 0
        while True:
                env.render()
                print(observation)
                action = env.action_space.sample()
                observation, reward, done, info = env.step(action)
                t += 1
        if done:
                print("Done after {} steps".format(t+1))
                break

Anything I should be doing different?

iceman121 on 6 Mar 2017

Oh wow. Figured it out, bad indentation.

import gym
from gym import wrappers
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, path)

for i_episode in range(2000):
        observation = env.reset()
        t = 0
        while True:
                env.render()
                print(observation)
                action = env.action_space.sample()
                observation, reward, done, info = env.step(action)
                t += 1
                if done:
                       print("Done after {} steps".format(t+1))
                       break

iceman121 on 6 Mar 2017

I'm hitting this error as well. The BipedalWalker sometimes can easily get into a state where it doesn't make the env end, so for that env, I can't use Monitor and can't submit to Gym.

I've tried setting timestep_limit:

env = gym.make('BipedalWalker-v2')
env.spec.timestep_limit = 10
print( env.spec.timestep_limit )
if submit:
    env = gym.wrappers.Monitor(env, 'run', force=True)

But it just blasts past those timesteps, not returning done. What's the correct solution here? Can we just let monitor deal with reset being called? Or can I set a timestep limit in another way?

tjacobs on 29 May 2017

@tjacobs
you can set it with, for example:

env = gym.make('Pong-v0')
env._max_episode_steps = your_value

4SkyNet on 31 May 2017

👍10 🎉2 👎1

Sweet, of course, thanks!

tjacobs on 1 Jun 2017

👎1

Hi, I am getting the same _gym.error.ResetNeeded: Trying to step environment which is currently done. While the monitor is active for CarRacing-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode._ error although I use _while True_ to step through an episode and _force=True_ when I create the env. What else could I try?

Thanks!

iulialexandra on 15 Nov 2017

@tjacobs @4SkyNet - this doesn't work for me, I can set the variable, but the env doesn't return 'done = True' for when I hit the steps limit.

mkhansenbot on 21 Apr 2018

Seems like it works very well with CartPole but now that I am using AntBulletEnv it only records one trajectory and then quits.
Does anyone have any workaround for that ?

amini2nt on 10 Oct 2018

You can manually save the video and set done as True with for/else clause:

import gym
from gym import wrappers
MAX_STEPS = 20
env = gym.make('CartPole-v0')
env = wrappers.Monitor(env, directory='/tmp/cartpole-experiment', force=True)

for i_episode in range(200):
    observation = env.reset()
    for t in range(MAX_STEPS):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
    else:
         env.stats_recorder.save_complete()
         env.stats_recorder.done = True

env.monitor.close()

Check for/else clause