Before my question, let me introduce my understanding of the checkpoint file system. (you can skip it and toward my question)
The codes in example/multiagent_cartpole.py produces a experiment_state-2019-04-03_00-47-28.json-like file and a directory PPO_experiment_name with a few .pkl, .json, .csv files in it.
The file system looks like:
- local_dir (say: "~/ray_results")
- exp_name (say: "PPO")
- checkpoints (say: experiment_state-2019-04-05_17-59-00.json)
- directory (named like: PPO_cartpole_0_2019-04-05_18-28-0296h2tknq)
- xxx.log
- params.json
- params.pkl (This is the file to store trained parameter, I guess?)
- progress.csv
- result.json
After one successful training, now we have a trained agent (Because I used one shared policy for all agent). We set the local_dir exactly the same as training. Then set the exp_name exactly as training too, namely PPO.
Now it's my problem. The tune.run function take two arguments which looks like helpful for restoring.
The resume argument, once set to True, will automatically search in local_dir/exp_name/ finding the most recent experiment_state-<date_time>.json.
The resume work well. After setting it to true, the restoring seems to be successful, but the program immediately terminated, as if it inherit the termination states from the checkpoint.
Here's the log:
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/1 GPUs
Memory usage on this node: 4.3/16.7 GB
Result logdir: /home/SENSETIME/pengzhenghao/ray_results/PPO
Number of trials: 1 ({'TERMINATED': 1})
TERMINATED trials:
- PPO_tollgate_0: TERMINATED, [12 CPUs, 1 GPUs], [pid=9214], 4846 s, 300 iter, 1320000 ts, 1.1e+03 rew
The printed reward is exactly what trained agent able to give, but I cannot continue to train this agent, even if I set the num_iters greater than the number of iterations in last training (namely 300).
What's more, it seems impossible using the resume argument to specify a checkpoint given the exact filename.
In a nut shell, my question on the resume argument is:
After setting restore=<log_dir>, namely restore="./experiments", which is my log_dir, it turn out to be an error:
Traceback (most recent call last):
File "xxx/anaconda3/envs/dev/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 499, in restore
ray.get(trial.runner.restore.remote(value))
File "xxx/anaconda3/envs/dev/lib/python3.6/site-packages/ray/worker.py", line 2316, in get
raise value
ray.exceptions.RayTaskError: ray_PPOAgent:restore() (pid=28099, host=g114e1900387)
File "xxx/anaconda3/envs/dev/lib/python3.6/site-packages/ray/tune/trainable.py", line 304, in restore
with open(checkpoint_path + ".tune_metadata", "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './experiments.tune_metadata'
I have checked everywhere of this computer and there is no such a file ended with .tune_metadata. I am really confusing.
In short, what I am trying to do is:
Restore the trained agent and continue it's training with the same config.
Restore the trained agent, retrieve the Policy network, and used in the same environment with rendering, in order to visualize it's performance.
Restore the trained agent as a pre-trained agent and modify the config, such as using more workers and GPU to training on cluster.
Could you please tell me what I should do?
(By the way, the document is really insufficient for thoroughly understanding the whole process of rllib. Nevertheless I still appreciate your guys for this excellent project, wish some day I can make some contribution too~)
I think there is some confusion here about tune's checkpointing of experiment state, vs RLlib's checkpointing of trial state.
To enable RLlib checkpointing, you have to specify --checkpoint-freq. For example: rllib train --run=PG --checkpoint-freq=1 --env=CartPole-v0
Then, this will create checkpoints in ~/ray_results that includes the .tune_metadata file. To restore, you can specify one of those paths, for example rllib train --run=PG --env=CartPole-v0 --restore=$HOME/ray_results/default/PG_CartPole-v0_0_2019-04-05_16-43-02s_gcpmkl/checkpoint_9/checkpoint-9.
This path call also be passed to agent.restore() in the Python API, which allows support for more advanced use cases like (2). For (1) and (3) I think the --restore flag for Tune may work.
By the way, the document is really insufficient for thoroughly understanding the whole process of rllib.
Agree! This part happens to be halfway documented in RLlib and half in Tune. Some of it is here: https://ray.readthedocs.io/en/latest/rllib-training.html Any suggestions on how to improve this would be helpful.
Thanks for your reply! I find that if running tune.run without any change, the parameter of trained agent would not saved... The .pkl file saved automatically simply records “trail” instead of anything related to neural network.
Unfortunately the training for last few days goes in vain. I suggest to turn checkpoint_at_end be True as default...
The only blocker for enabling checkpointing by default is https://github.com/ray-project/ray/pull/4490
That will avoid out of disk space errors for long training.
For the potential reader:
resume argument do nothing but continue the last unfinished task. In this mode, it's no allowed to reset the num_iters.
restore argument take the path of the checkpoint file as input. Concretely, the file look like ~/ray_results/expname/envname_date_someothercodes/checkpoint_10/checkpoint-10. Note that the checkpoint files would only exist for those tune.run() executions with checkpoint_at_end=True or checkpoint_freq setting to non-zero value.
Using restore argument and taking the checkpoint from which you want to continue the experiment is the only way to enlarge the number of iterations of a finished or unfinished experiment.
Thank Eric for offering quick and kind responses!
For me, the restore does not work no matter what I try it seems.
tune.run(
"PPO",
#name="PPO_discrete5",
local_dir="/content/drive/My Drive/Colab Notebooks/rltrader/Experiments",
checkpoint_freq=10, # iterations
checkpoint_at_end=True,
max_failures=100,
#resume=True,
restore='content/drive/My Drive/Colab Notebooks/rltrader/Experiments/PPO_discrete3/PPO_ContTradingEnv_0_2019-05-03_04-51-02zykryvgl/checkpoint_218/checkpoint-218',
#search_alg=algo,
#scheduler=ahb,
# 2 if testing, 50 or more for real
#num_samples=50,
stop={
# "episode_reward_mean": 0,
# "training_iteration": 1,
# "timesteps_total": 1000,
"episodes_total": 1000,
},
What combination of the above do I need to restore a checkpoint using tune.run, or is restore not working? I have run 1000 episodes, and wish to run 1000 more.
@evanatyourservice Are you using the latest Ray? The ray failed to restore due to a bug and still not fixed yet. See https://github.com/ray-project/ray/pull/4733
@evanatyourservice Please re-run your codes using the latest Ray and see if everything work well.
My view of the resume/restore:
If somehow the running stops, you add resume=True. If any of the trials give an error, resume won't restart them.
Here comes the nice part:
find "[local_dir]/[name]" -iname checkpoint-[K]where K is the last checkpoint created, or last iteration, the checkpoint from which you want to retry something different
Other ideas:
LR:
where lr_batch_size is your number of timesteps per interation.
'lr_schedule': [[0 * lr_batch_size, 5e-5],
[75 * lr_batch_size, 5e-5],
[110 * lr_batch_size, 1e-5],
[110 * lr_batch_size, 1e-5],
[120 * lr_batch_size, 5e-6],
[140 * lr_batch_size, 5e-7],
[200 * lr_batch_size, 1e-10],
[300 * lr_batch_size, 1e-12],
]
Chart:
Since I was searching for a simple way to load a trained agent and continue training with RLlib, and I only found this issue, here's what I found & what's the easiest way in my opinion:
ray.tune.run(PPOTrainer, config=myconfig, restore=path_to_trained_agent_checkpoint)
Ie, just set the path in the restore argument, that's it! No need for a custom train function.
Most helpful comment
Since I was searching for a simple way to load a trained agent and continue training with RLlib, and I only found this issue, here's what I found & what's the easiest way in my opinion:
Ie, just set the path in the
restoreargument, that's it! No need for a custom train function.