Hyperband scheduler doesn't work. Some experiments succeed but about 50% fail.
I tried Ray version 0.7.2 and 0.9.0.dev0 running python 3.7 on Ubuntu 18.04.
The def update_trial_stats(self, trial, result) function fails and here is the error:
Failure # 1 (occurred at 2020-02-02_14-45-58)
Traceback (most recent call last):
File "/home/kaleab/anaconda3/envs/research/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 461, in _process_trial
self, trial, flat_result)
File "/home/kaleab/anaconda3/envs/research/lib/python3.7/site-packages/ray/tune/schedulers/hyperband.py", line 172, in on_trial_result
bracket.update_trial_stats(trial, result)
File "/home/kaleab/anaconda3/envs/research/lib/python3.7/site-packages/ray/tune/schedulers/hyperband.py", line 382, in update_trial_stats
assert delta >= 0
AssertionError
I have run validate_save_restore(trainable_class, config=validate_save_config) and
validate_save_restore(trainable_class, config=validate_save_config, use_object_store=True) and they both succeed.
Run hyperband on CIFAR 10, with the following config:
sched = HyperBandScheduler(
time_attr="training_iteration",
metric="accuracy",
mode="max",
max_t=ray_config['epochs'])
Trainable class similiar to https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/mnist_pytorch_trainable.py, with the exact same save and restore methods.
I am getting the same result on OSX. It appears as if the _restore method is not being called.
Same result on ray 0.8.1, python 3.7, and ubuntu 18.04
Same here on macOS, python 3.7 and ray 0.8.2
BOHB is not working either due to the same reason.
@semin-park currently investigating this!
This should be addressed in https://github.com/ray-project/ray/pull/7563 . You can download the wheels of the latest master snapshot on here: https://ray.readthedocs.io/en/latest/installation.html
Please reopen this if you run into any issues!
Most helpful comment
@semin-park currently investigating this!