cluster_name: default
min_workers: 0
max_workers: 0
docker:
image: ""
container_name: ""
target_utilization_fraction: 0.8
idle_timeout_minutes: 5
provider:
type: local
head_ip: MILLEN_c71
worker_ips: [MILLEN_c72_IP] # subsequent changes to this field throw errors
auth:
ssh_user: USERNAME
ssh_private_key: ~/.ssh/id_rsa
file_mounts: {}
# "/tmp/ray_sha": "/YOUR/LOCAL/RAY/REPO/.git/refs/heads/YOUR_BRANCH"
setup_commands: []
head_setup_commands: []
worker_setup_commands: []
setup_commands:
- conda activate ray && echo "hello werld"
head_start_ray_commands:
- conda activate ray && ray stop
- conda activate ray && ulimit -c unlimited && ray start --head --redis-port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- conda activate ray && ray stop
- conda activate ray && ray start --redis-address=$RAY_HEAD_IP:6379
Changing worker IPs in local autoscaler will throw an assertion error which I don't know how to get around.
Also the cluster state file isn't cleaned up after ray down, so this problem persists even across different clusters.
File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/node_provider.py", line 100, in get_node_provider
return provider_cls(provider_config, cluster_name)
File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/local/node_provider.py", line 77, in __init__
provider_config)
File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/local/node_provider.py", line 51, in __init__
assert len(workers) == len(provider_config["worker_ips"]) + 1
AssertionError
You can remove the local state file in /tmp/cluster-NAME (or change the cluster name).
The error message should probably make this more clear.
@ericl is right. Assertion error was occurring due to a conflict between new config file and old cluster state. Removing the cluster's state and lock file from /tmp directory resolved the issue.
Most helpful comment
You can remove the local state file in /tmp/cluster-NAME (or change the cluster name).
The error message should probably make this more clear.