How do you guys currently manage the case of "I want to load up a saved checkpoint of that model from last week and play an episode with it?"
What is the easiest way to find the directory of the latest checkpoint for a given trainable class? There are two different directories where stuff is stored and I can't seem to ever remmember where things are.
Right now I have to open tensorboard, lookup a good experiment e.g. CartPoleTrainable/CartPoleTrainable_86edf4bf_2020-02-02_13-41-50a8gnqyok and then use that in restore(dir).
The workflow I'm thinking is something like this (pseudocode):
t = MyTrainable()
t.load_latest_best_checkpoint_for_this_trainable()
t.play() # my custom function that will play/render an episode
Is there something equivalent to latest_checkpoint from tensorflow?
P.S.: Awesome library, trainable interface is great.
Hey, this should help you. The simplest solution I found to load in a checkpoint and perform evaluations.
analysis = Analysis(path_to_results)
checkpoint_dir = analysis.get_best_config(metric=self._config['metric'])
checkpoint_path = <your code to extract latest checkpoint file from the best logdir>
model = ppo.PPOTrainer(env="NameOfYourEnv", config=test_config)
model.restore(checkpoint_path)
env = <create your env>
obs = env.reset()
while True:
action = model.compute_action(obs, prev_action=0, prev_reward=0)
obs, reward, done, info = env.step(action)
Alternatively, you can create a PolicyServer, though I could not get this to work with a different number of workers then what my model was initially trained with, and the server wouldn't support multiple workers because it's listening on some address. Here's more info for that, they used Cartpole as well. Not sure if it's even supported as of right now.
Edit: Also If you'd prefer to use the tune wrapper, just pass the "checkpoint_path" into the restore argument for tune.run(..... restore=checkpoint_path) and set your stopping criteria to 1 episode
If you'd prefer to use the tune wrapper, just pass the "checkpoint_path" into the restore argument for tune.run(..... restore=checkpoint_path) and set your stopping criteria to 1 episode
Interesting! How would I access the model?
but this is nondeterministic right ?
how to get it in deterministic way
Closing until I try this out.
Most helpful comment
Hey, this should help you. The simplest solution I found to load in a checkpoint and perform evaluations.
Alternatively, you can create a PolicyServer, though I could not get this to work with a different number of workers then what my model was initially trained with, and the server wouldn't support multiple workers because it's listening on some address. Here's more info for that, they used Cartpole as well. Not sure if it's even supported as of right now.
Edit: Also If you'd prefer to use the tune wrapper, just pass the "checkpoint_path" into the restore argument for tune.run(..... restore=checkpoint_path) and set your stopping criteria to 1 episode