I set up a cluster using 2 PC. I run a main.py which import ppo_driver2.py and call ray.init(), and it crashed.
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/py35/lib/python3.5/site-packages/ray/actor.py", line 114, in fetch_and_register_actor
unpickled_class = pickle.loads(pickled_class)
ImportError: No module named 'ppo_driver2'
You can inspect errors by running
ray.error_info()
If this driver is hanging, start a new one with
ray.init(redis_address="192.168.1.137:6379")
Remote function __init__ failed with:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/py35/lib/python3.5/site-packages/ray/worker.py", line 727, in _process_task
self.actors[task.actor_id().id()], *arguments)
File "/home/ubuntu/anaconda3/envs/py35/lib/python3.5/site-packages/ray/actor.py", line 100, in temporary_actor_method
"cannot execute this method".format(actor_name))
I find similar issue in #274 , how do you deal with it then?
You probably need the file ppo_driver2.py to be on both machines in the same location. Can you try copying the file to both machines and see if that works?
@robertnishihara It works. Thanks. Then I start 20 actors on the head node and run $top on both nodes to see how the actors are assigned. On the head node the actors work fine. On the other node, there are some python process are active at first. But after a while, actors on the other node stop being active and the memory is still used. Any ideas?
And how can I assign any actor on the head node so as to see the rendering?
Can you describe the problem in a bit more detail? For example, is the job hanging? Or crashing? Or does it run successfully to completion? Can you share code showing what you are running?
@robertnishihara Sorry, it turns out that there's some problem with the environment of the other node. Now it works. Thanks.
Ok great! I'll close this issue for now then, but feel free to reopen it.
Most helpful comment
You probably need the file
ppo_driver2.pyto be on both machines in the same location. Can you try copying the file to both machines and see if that works?