Ray: [rllib] How to save a TF checkpoint

Created on 29 Aug 2018 · 9Comments · Source: ray-project/ray

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS7
Ray installed from (source or binary): source
Ray version: 0.5.2
Python version: 3.6
Exact command to reproduce:

Describe the problem

I am playing opensim prosthetics and it seems that Ray works well.
As I ssh to the machine for training, it is not viable to render the environment (i.e., visualize the interaction).
I want to download the TF checkpoint files to my PC and restore the model to analyze the learnt policy.
I found that the checkpoint of Ray doesn't include the TF checkpoint files. Is there any way to save the TF model as TF checkpoint files? Thanks!

Source code / logs

question

Source

joneswong

Most helpful comment

Oh I see, we don't use TF checkpoints for serialization. You can however restore the agent, and then inspect the model as you want afterwards (or save it): https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-agents

The policy graph class can be accessed at agent.local_evaluator.policy_map["default"] after restoring the agent from a checkpoint.

ericl on 30 Aug 2018

👍2

All 9 comments

You can set checkpoint_freq in the tune config, and after https://github.com/ray-project/ray/pull/2754/files you can set checkpoint_on_end.

ericl on 29 Aug 2018

thanks for your suggestions. I have specified checkpoint_freq in my .yaml file and there are .extra_data and .tune_data files as expected. However, my question is that I cannot find .meta, .index, etc. files. How about instantiating an agent and call its restore(), then instantiate a tf.train.Saver object and call its save() to get those TF checkpoint files.

joneswong on 30 Aug 2018

The policy graph class can be accessed at agent.local_evaluator.policy_map["default"] after restoring the agent from a checkpoint.

ericl on 30 Aug 2018

👍2

rollout.py is very clear. Now, I can restore the Ray agent and call compute_actions() to interact with the osim client.
Another question: If I want to restore Ray agent from a checkpoint but continue training with a different setting. What's the best solution? I am considering hacking the restore method, e.g., modify some class member variables after the recover.

joneswong on 1 Sep 2018

I don't think restoring from a checkpoint actually copies the config over,
so you can give a different config at agent initialization time.

Btw, note this bug fix for rollout of Apex DDPG:
https://github.com/ray-project/ray/pull/2791

On Fri, Aug 31, 2018, 10:11 PM Jones Wong notifications@github.com wrote:

rollout.py is very clear. Now, I can restore the Ray agent and call
compute_actions() to interact with the osim client.
Another question: If I want to restore Ray agent from a checkpoint but
continue training with a different setting. What's the best solution? I am
considering hacking the restore method, e.g., modify some class member
variables after the recover.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/2764#issuecomment-417833777,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6SnMkpxe2pcO_7Yd4HGgogl-cKd_kks5uWhb7gaJpZM4WRCR0
.

ericl on 1 Sep 2018

You can give a diff config when the agent is instantiated:

agent = cls(env=args.env, config=args.config)
agent.restore(args.checkpoint)

However, when consider that both the evaluators (including policy graphs) and optimizers are restored from ckp:
``def _restore(self, checkpoint_path): extra_data = pickle.load(open(checkpoint_path + ".extra_data", "rb")) self.local_evaluator.restore(extra_data[0]) ray.get([ e.restore.remote(d) for (d, e) in zip(extra_data[1], self.remote_evaluators) ]) self.optimizer.restore(extra_data[2]) self.num_target_updates = extra_data[3] self.last_target_update_ts = extra_data[4] ```` and they are specified by your config, I am a little confused. For example, if your new config uses a diff replay buffer capacity, it will crash when the optimizer is restored from a ckp. If your new config specify a different exploration setting, since thenum_train_stepandnum_sample_stepof optimizer are restored from ckp, how could your new setting work. A more wired case is that I need to modify the computation graph, e.g., add a layer at the end. If I directly specify a newhiddensin config, I don't think the evaluators can be successfully restored. I need to de-finalize the graph and modify the graph after the evaluators have been restored, in therestore()` method, right?

joneswong on 1 Sep 2018

Right, it doesn't make sense to change some types of configs. However you
can change things like learning rate and batch size easily. This is in fact
how PBT works in Ray tune.

On Fri, Aug 31, 2018, 10:49 PM Jones Wong notifications@github.com wrote:

You can give a diff config when the agent is instantiated:

agent = cls(env=args.env, config=args.config)
agent.restore(args.checkpoint)

However, when consider that both the evaluators (including policy graphs)
and optimizers are restored from ckp:

def _restore(self, checkpoint_path):
extra_data = pickle.load(open(checkpoint_path + ".extra_data", "rb"))
self.local_evaluator.restore(extra_data[0])
ray.get([
e.restore.remote(d)
for (d, e) in zip(extra_data[1], self.remote_evaluators)
])
self.optimizer.restore(extra_data[2])
self.num_target_updates = extra_data[3]
self.last_target_update_ts = extra_data[4]

and they are specified by your config, I am a little confused. For
example, if your new config uses a diff replay buffer capacity, it will
crash when the optimizer is restored from a ckp. If your new config specify
a different exploration setting, since the num_train_step and
num_sample_step of optimizer are restored from ckp, how could your new
setting work. A more wired case is that I need to modify the computation
graph, e.g., add a layer at the end. If I directly specify a new hiddens
in config, I don't think the evaluators can be successfully restored. I
need to de-finalize the graph and modify the graph after the evaluators
have been restored, in the restore() method, right?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/2764#issuecomment-417835350,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6SmcjT5IzO383gseOsY7LteNaCVeHks5uWh_hgaJpZM4WRCR0
.

ericl on 1 Sep 2018

👍1

@ericl Hi Eric, I have spent two days in training an agent for osim. It can be successfully restored from checkpoints. However, some analysis show that I need to change the environment wrapper and use a larger replay buffer. It is viable to continue the training from a checkpoint with these modification? How to determine which parts of an agent could be modified? I tried the former one and running normally. However, I am not sure whether the modification really works now. As for the capacity, any solution?

joneswong on 4 Sep 2018

I think both are potentially fine. Though, it depends on the environment
wrapper if that invalidates the previous model?

On Tue, Sep 4, 2018, 1:38 AM Jones Wong notifications@github.com wrote:

@ericl https://github.com/ericl Hi Eric, I have spent two days in
training an agent for osim. It can be successfully restored from
checkpoints. However, some analysis show that I need to change the
environment wrapper and use a larger replay buffer. It is viable to
continue the training from a checkpoint with these modification? How to
determine which parts of an agent could be modified? I tried the former one
and running normally. However, I am not sure whether the modification
really works now. As for the capacity, any solution?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/2764#issuecomment-418287470,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6SjiObAQcAGayD2dJ3FtbZ4ozGYUVks5uXjwBgaJpZM4WRCR0
.