ray create-or-update cluster.yaml
I followed the documentation and modified example-full.yaml to fill in username, node IP addresses, and custom setup commands.
Traceback:
ray create-or-update cluster.yaml
/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py:38: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(open(config_file).read())
/tmp/env/lib/python3.6/site-packages/ray/autoscaler/node_provider.py:115: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
defaults = yaml.load(f)
2019-04-04 03:39:25,901 INFO node_provider.py:34 -- ClusterState: Loaded cluster state: {'c79.millennium.berkeley.edu': {'tags': {'ray-node-type': 'worker'}, 'state': 'terminated'}, 'c80.millennium.berkeley.edu': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '6c51b8169c9469f0fa2568e5d238af2585d302a7', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
2019-04-04 03:39:25,902 INFO node_provider.py:59 -- ClusterState: Writing cluster state: {'c79.millennium.berkeley.edu': {'tags': {'ray-node-type': 'worker'}, 'state': 'terminated'}, 'c80.millennium.berkeley.edu': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '6c51b8169c9469f0fa2568e5d238af2585d302a7', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
This will restart cluster services [y/N]: y
2019-04-04 03:39:29,888 INFO commands.py:202 -- get_or_create_head_node: Updating files on head node...
Traceback (most recent call last):
File "/tmp/env/bin/ray", line 11, in <module>
sys.exit(main())
File "/tmp/env/lib/python3.6/site-packages/ray/scripts/scripts.py", line 766, in main
return cli()
File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/tmp/env/lib/python3.6/site-packages/ray/scripts/scripts.py", line 460, in create_or_update
no_restart, restart_only, yes, cluster_name)
File "/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 47, in create_or_update_cluster
override_cluster_name)
File "/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 243, in get_or_create_head_node
initialization_commands=config["initialization_commands"],
KeyError: 'initialization_commands'
Can you try installing the latests master?
We should probably make all of our configs backward compatible from now on...
I'm getting the same error on the latest master. This line is causing the error.
Looks like the local example is out of date.
Same here, 0.6.6
Am having the same issue.. So, what's the fix??
Fresh install, 0.6.6. - also updated to 0.7 dev2, same error.
In my case the error is:
ray up settings.yaml
/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/commands.py:38: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(open(config_file).read())
/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/node_provider.py:115: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
defaults = yaml.load(f)
2019-05-16 03:24:21,128 INFO node_provider.py:34 -- ClusterState: Loaded cluster state: {}
Traceback (most recent call last):
File "/home/jake/py/tf2/bin/ray", line 10, in <module>
sys.exit(main())
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/scripts/scripts.py", line 766, in main
return cli()
File "/home/jake/py/tf2/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/jake/py/tf2/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/jake/py/tf2/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/jake/py/tf2/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/jake/py/tf2/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/scripts/scripts.py", line 460, in create_or_update
no_restart, restart_only, yes, cluster_name)
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 47, in create_or_update_cluster
override_cluster_name)
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 163, in get_or_create_head_node
provider = get_node_provider(config["provider"], config["cluster_name"])
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/node_provider.py", line 103, in get_node_provider
return provider_cls(provider_config, cluster_name)
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/local/node_provider.py", line 86, in __init__
provider_config)
File "/home/jake/py/tf2/lib/python3.6/site-packages/ray/autoscaler/local/node_provider.py", line 55, in __init__
TAG_RAY_NODE_TYPE] == "head"
AssertionError
YAML file taken from online ray/python/ray/autoscaler/gcp/example-full.yaml. Just added IP's:
cluster_name: default
min_workers: 0
max_workers: 0
initial_workers: 0
autoscaling_mode: default
docker:
image: ""
container_name: ""
target_utilization_fraction: 0.8
idle_timeout_minutes: 5
provider:
type: local
head_ip: 10.0.0.2
worker_ips: [10.0.0.2]
auth:
ssh_user: YOUR_USERNAME
ssh_private_key: ~/.ssh/id_rsa
head_node: {}
worker_nodes: {}
file_mounts: {}
setup_commands: []
head_setup_commands: []
worker_setup_commands: []
setup_commands:
I am facing the same issue. However, a workaround is to manually ssh in to the head and worker nodes of the cluster and run the respective commands mentioned in the example-full.yaml file OR use gnu-parallel command to run a script which does the same.
It seems a bug. Adding initialization_commands: [] to the yaml file seems to bypasses this error (though there are other errors generated).
See 4811.
55, in __init__ TAG_RAY_NODE_TYPE] == "head" AssertionError
This particular assertion appears to be a separate problem, as I discovered the hard way. It seems to occur when the address of head_ip is also included in worker_ips. Removing the head_ip from the worker list eliminated the error for me. I also found it necessary to delete the tmp/cluster-\