Creating a ray cluster using the RAY_ADDRESS environment variable causes a second copy of monitor.py to launch. 2 competing autoscalers leads to some strange issues.
Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):
Call ray up config.yaml with any config file.
Observe that everything is normal.
Run ray submit config.yaml test.py
test.py:
import ray
ray.init() # Connects to existing cluster because env var is automatically set when using ray submit
while True:
pass
Now attach to the session:
kill the job
observe that the raylet/rest of cluster is also torn down
If we cannot run your script, we cannot fix your issue.
What if we made this print a warning (that you should set address="auto" to pick up the address?)
why not just make address= work as expected?
I would be comfortable with that if we also renamed RAY_ADDRESS to RAY_OVERRIDE_ADDRESS or something like that.
so just to be clear, you're advocating for env var taking precedence over argument to ray.init even if auto is specified?
seems to have fixed itself... closing since it can't be reproduced anymore
Most helpful comment
seems to have fixed itself... closing since it can't be reproduced anymore