Ray: [tune] Tuning a non-ML system

Created on 29 Jul 2020  路  11Comments  路  Source: ray-project/ray

I am confused about how to use Ray Tune for a non-ML system. I want to optimize a software system for data cleansing, which comes with a number of parameters.

I have an objective function that runs the system and computes a fitness value:

def fitness(params):
    run_my_system(params)
    return evaluate_output()

I want to use Ray Tune for its "parallel / distributed" facilities, and the search algorithms that help to explore the hyperparameter space more efficiently (than grid search).

The documentation of Ray Tune, however, is mostly prepared for ML-related tasks. Even the simplest examples have ML-related parameters:

def train_mnist(config):
    train_loader, test_loader = get_data_loaders()
    model = ConvNet()
    optimizer = optim.SGD(model.parameters(), lr=config["lr"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)
        tune.report(mean_accuracy=acc)

What I specifically don't understand is what is the role of the for loop? My system does not behave stochastically and it only depends on the input data and the parameters (no randomness inside the system).

Is there ANY example of Ray Tune that is not mixed with these ML concepts such as "Trial", "step" and "mean_loss"?

I just want to define the objective function, the search algorithm, and the number of parallel workers. Is Ray Tune a proper choice for this?

question

All 11 comments

Tune should work fine for your use case.

Maybe something like this can help you get started?

def trainable(config):
    # config (dict): A dict of hyperparameters.

    score = objective(x, config["a"], config["b"])
    tune.report(score=score) 

I just tried the sample code that is introduced in Ray's Github page, taking out the for loop. It doesn't work anymore:

def objective(alpha, beta):
    return  ( alpha * beta)

def trainable(config):
    score = objective(config["alpha"], config["beta"])
    tune.report(score=score)

analysis = tune.run(
    trainable,
    config={
        "alpha": tune.grid_search([0.001, 0.01, 0.1, 1.2]),
        "beta": tune.choice([-1, -2, 1, 2,])
    })

print("Best config: ", analysis.get_best_config(metric="score"))

Ray cannot optimize this simple function. the best config found is: Best config: {'alpha': 0.001, 'beta': -2}, which is neither the maximum nor the minimum of the alpha * beta function.

What am I missing here?

== Status ==
Memory usage on this node: 4.0/94.1 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/16 CPUs, 0/0 GPUs, 0.0/55.13 GiB heap, 0.0/18.99 GiB objects
Result logdir: ~/ray_results/trainable
Number of trials: 4 (3 PENDING, 1 RUNNING)
+-----------------------+----------+-------+---------+--------+
| Trial name            | status   | loc   |   alpha |   beta |
|-----------------------+----------+-------+---------+--------|
| trainable_32065_00000 | RUNNING  |       |   0.001 |     -2 |
| trainable_32065_00001 | PENDING  |       |   0.01  |     -2 |
| trainable_32065_00002 | PENDING  |       |   0.1   |     -1 |
| trainable_32065_00003 | PENDING  |       |   1.2   |     -1 |
+-----------------------+----------+-------+---------+--------+


Result for trainable_32065_00001:
  date: 2020-07-29_21-36-35
  done: false
  experiment_id: 8111937c8a5448d9a17c9a7eeb480bdb
  experiment_tag: 1_alpha=0.01,beta=-2
  hostname: ws-1810
  iterations_since_restore: 1
  node_ip: 106.1.8.144
  pid: 2689
  score: -0.02
  time_since_restore: 0.00035071372985839844
  time_this_iter_s: 0.00035071372985839844
  time_total_s: 0.00035071372985839844
  timestamp: 1596054995
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: '32065_00001'

Result for trainable_32065_00002:
  date: 2020-07-29_21-36-35
  done: false
  experiment_id: 18ba67b782aa4a48828e38e5ae50e95a
  experiment_tag: 2_alpha=0.1,beta=-1
  hostname: ws-1810
  iterations_since_restore: 1
  node_ip: 106.1.8.144
  pid: 2694
  score: -0.1
  time_since_restore: 0.0003161430358886719
  time_this_iter_s: 0.0003161430358886719
  time_total_s: 0.0003161430358886719
  timestamp: 1596054995
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: '32065_00002'

2020-07-29 21:36:35,849 INFO logger.py:271 -- Removed the following hyperparameter values when logging to tensorboard: {'beta': -1}
Result for trainable_32065_00003:
  date: 2020-07-29_21-36-35
  done: false
  experiment_id: 31bc632e60944b649f6f5363bed6e86a
  experiment_tag: 3_alpha=1.2,beta=-1
  hostname: ws-1810
  iterations_since_restore: 1
  node_ip: 106.1.8.144
  pid: 2681
  score: -1.2
  time_since_restore: 0.00017547607421875
  time_this_iter_s: 0.00017547607421875
  time_total_s: 0.00017547607421875
  timestamp: 1596054995
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: '32065_00003'

2020-07-29 21:36:35,864 INFO logger.py:271 -- Removed the following hyperparameter values when logging to tensorboard: {'beta': -2}
2020-07-29 21:36:35,868 INFO logger.py:271 -- Removed the following hyperparameter values when logging to tensorboard: {'beta': -1}
Result for trainable_32065_00000:
  date: 2020-07-29_21-36-35
  done: false
  experiment_id: 87b189932d2547ce82122e8e118ac7c0
  experiment_tag: 0_alpha=0.001,beta=-2
  hostname: ws-1810
  iterations_since_restore: 1
  node_ip: 106.1.8.144
  pid: 2687
  score: -0.002
  time_since_restore: 0.00032830238342285156
  time_this_iter_s: 0.00032830238342285156
  time_total_s: 0.00032830238342285156
  timestamp: 1596054995
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: '32065_00000'

2020-07-29 21:36:35,879 INFO logger.py:271 -- Removed the following hyperparameter values when logging to tensorboard: {'beta': -2}
== Status ==
Memory usage on this node: 4.0/94.1 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/55.13 GiB heap, 0.0/18.99 GiB objects
Result logdir: ~/ray_results/trainable
Number of trials: 4 (4 TERMINATED)
+-----------------------+------------+-------+---------+--------+--------+------------------+
| Trial name            | status     | loc   |   alpha |   beta |   iter |   total time (s) |
|-----------------------+------------+-------+---------+--------+--------+------------------|
| trainable_32065_00000 | TERMINATED |       |   0.001 |     -2 |      1 |      0.000328302 |
| trainable_32065_00001 | TERMINATED |       |   0.01  |     -2 |      1 |      0.000350714 |
| trainable_32065_00002 | TERMINATED |       |   0.1   |     -1 |      1 |      0.000316143 |
| trainable_32065_00003 | TERMINATED |       |   1.2   |     -1 |      1 |      0.000175476 |
+-----------------------+------------+-------+---------+--------+--------+------------------+


Best config:  {'alpha': 0.001, 'beta': -2}

Well,

  1. The particular hyperparameter search configuration you gave is a random sample, not an exhaustive one. For an exhaustive evaluation would require "beta": tune.grid_search([-1, -2, 1, 2,])
  2. you probably want analysis.get_best_config(metric="score", mode="min"), as the provided output I think is the "max value" over all that is evaluated.

Well,

  1. The particular hyperparameter search configuration you gave is a random sample, not an exhaustive one. For an exhaustive evaluation would require "beta": tune.grid_search([-1, -2, 1, 2,])
  2. you probably want analysis.get_best_config(metric="score", mode="min"), as the provided output I think is the "max value" over all that is evaluated.

Thank you, this one works for the toy example.

But what if I don't want to do the exhaustive grid search (which is not feasible for larger number of parameters)?

I need the optimizer to take a look at all the previous experiments, decide which new configs are likely to have a better score, and try the new experiments (in parallel). So the ideal thing is like:

while( condition met):
   new_configs = suggest_config(parameters, previous_configs)
   run experiments with new_configs in parallel (N workers at a time = #CPUs)

How can I configure Ray to optimize this way?

Here's a sketch - typically, I wouldn't use a "hard barrier" in between evaluations, but here's something you can do:

def objective(alpha, beta):
    return  ( alpha * beta)

def trainable(config):
    score = objective(config["alpha"], config["beta"])
    tune.report(score=score)

from ray.tune.suggest.bayes_opt import BayesOptSearch
search = BayesOptSearch(space={"alpha": (0.001, 1.2), "beta": (-2, 2)}) 
# you probably also want to use the concurrencylimiter

from ray.tune.suggest import ConcurrencyLimiter
search = ConcurrencyLimiter(search, max_concurrent=4)
analysis = tune.run(
    trainable,
    num_samples=30,
    search_alg=search
)

print("Best config: ", analysis.get_best_config(metric="score"))

^ there might be typos, but you get the idea.

Here's a sketch - typically, I wouldn't use a "hard barrier" in between evaluations, but here's something you can do:

def objective(alpha, beta):
    return  ( alpha * beta)

def trainable(config):
    score = objective(config["alpha"], config["beta"])
    tune.report(score=score)

from ray.tune.suggest.bayes_opt import BayesOptSearch
search = BayesOptSearch(space={"alpha": (0.001, 1.2), "beta": (-2, 2)}) 
# you probably also want to use the concurrencylimiter

from ray.tune.suggest import ConcurrencyLimiter
search = ConcurrencyLimiter(search, max_concurrent=4)
analysis = tune.run(
    trainable,
    num_samples=30,
    search_alg=search
)

print("Best config: ", analysis.get_best_config(metric="score"))

^ there might be typos, but you get the idea.

Thanks. It raises an error and can't complete though.
The error is KeyError: 'Data point [1.2 2. ] is not unique'
Why?


== Status ==
Memory usage on this node: 3.4/94.1 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/16 CPUs, 0/0 GPUs, 0.0/55.18 GiB heap, 0.0/18.99 GiB objects
Result logdir: /home/SERILOCAL/a.hadian/ray_results/trainable
Number of trials: 29 (8 ERROR, 1 RUNNING, 20 TERMINATED)
+--------------------+------------+------------------+-----------+------------+--------+------------------+
| Trial name         | status     | loc              |     alpha |       beta |   iter |   total time (s) |
|--------------------+------------+------------------+-----------+------------+--------+------------------|
| trainable_c8510fce | ERROR      |                  | 1.2       |  2         |      1 |      0.000279427 |
| trainable_c8510fd2 | ERROR      |                  | 1.2       |  2         |      1 |      0.000332117 |
| trainable_c91e695b | ERROR      |                  | 1.2       |  2         |      1 |      0.000273466 |
| trainable_c91e695f | ERROR      |                  | 1.2       |  2         |      1 |      0.00028944  |
| trainable_c91e6963 | ERROR      |                  | 1.2       |  2         |      1 |      0.00026226  |
| trainable_c91e6967 | ERROR      |                  | 1.2       |  2         |      1 |      0.000272512 |
| trainable_ca8a4343 | ERROR      |                  | 1.2       |  2         |      1 |      0.000255346 |
| trainable_ca8a4347 | ERROR      |                  | 1.2       |  2         |      1 |      0.000327587 |
| trainable_cde22973 | RUNNING    | 106.1.8.144:6976 | 1.2       |  2         |      1 |      0.000271559 |
| trainable_c8510f9e | TERMINATED |                  | 0.450074  |  1.80286   |      1 |      0.000187874 |
| trainable_c8510f9f | TERMINATED |                  | 0.878661  |  0.394634  |      1 |      0.00020647  |
| trainable_c8510fa0 | TERMINATED |                  | 0.188066  | -1.37602   |      1 |      0.000328779 |
| trainable_c8510fa1 | TERMINATED |                  | 0.0706423 |  1.4647    |      1 |      0.000178337 |
| trainable_c8510fa2 | TERMINATED |                  | 0.721737  |  0.83229   |      1 |      0.000198364 |
| trainable_c8510fa3 | TERMINATED |                  | 0.0256808 |  1.87964   |      1 |      0.000220537 |
| trainable_c8510fa4 | TERMINATED |                  | 0.999099  | -1.15064   |      1 |      0.000213861 |
| trainable_c8510fa5 | TERMINATED |                  | 0.219008  | -1.26638   |      1 |      0.000289679 |
| trainable_c8510fa6 | TERMINATED |                  | 0.365786  |  0.0990257 |      1 |      0.000198603 |
| trainable_c8510fa7 | TERMINATED |                  | 0.518902  | -0.835083  |      1 |      0.000193357 |
| trainable_c8510fc6 | TERMINATED |                  | 1.2       |  1.78294   |      1 |      0.000273705 |
+--------------------+------------+------------------+-----------+------------+--------+------------------+
... 9 more trials not shown (9 TERMINATED)
Number of errored trials: 8
+--------------------+--------------+----------------------------------------------------------------------------------------------------------------------+
| Trial name         |   # failures | error file                                                                                                           |
|--------------------+--------------+----------------------------------------------------------------------------------------------------------------------|
| trainable_c8510fce |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_13_alpha=1.2,beta=2.0_2020-07-29_22-38-04vg2_zohw/error.txt |
| trainable_c8510fd2 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_14_alpha=1.2,beta=2.0_2020-07-29_22-38-052tama48n/error.txt |
| trainable_c91e695b |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_15_alpha=1.2,beta=2.0_2020-07-29_22-38-05x_5bun5l/error.txt |
| trainable_c91e695f |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_16_alpha=1.2,beta=2.0_2020-07-29_22-38-053o25dc2h/error.txt |
| trainable_c91e6963 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_17_alpha=1.2,beta=2.0_2020-07-29_22-38-05p2r9j_ee/error.txt |
| trainable_c91e6967 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_18_alpha=1.2,beta=2.0_2020-07-29_22-38-06hb1d2cw2/error.txt |
| trainable_ca8a4343 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_21_alpha=1.2,beta=2.0_2020-07-29_22-38-085mem3rdm/error.txt |
| trainable_ca8a4347 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_22_alpha=1.2,beta=2.0_2020-07-29_22-38-081o0161k7/error.txt |
+--------------------+--------------+----------------------------------------------------------------------------------------------------------------------+

2020-07-29 22:38:14,182 ERROR trial_runner.py:520 -- Trial trainable_cde22973: Error processing event.
Traceback (most recent call last):
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 487, in _process_trial
    trial.trial_id, result=flat_result)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/suggestion.py", line 280, in on_trial_complete
    trial_id=trial_id, result=result, error=error)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/suggestion.py", line 183, in on_trial_complete
    trial_id, result=result, error=error)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/bayesopt.py", line 210, in on_trial_complete
    self._register_result(params, result)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/bayesopt.py", line 224, in _register_result
    self.optimizer.register(params, self._metric_op * result[self.metric])
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/bayes_opt/bayesian_optimization.py", line 108, in register
    self._space.register(params, target)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/bayes_opt/target_space.py", line 161, in register
    raise KeyError('Data point {} is not unique'.format(x))
KeyError: 'Data point [1.2 2. ] is not unique'
Result for trainable_cde22977:
  date: 2020-07-29_22-38-14
  done: false
  experiment_id: ab2807291f7a41dc968858bbd4007b52
  experiment_tag: 30_alpha=1.2,beta=2.0
  hostname: ws-1810
  iterations_since_restore: 1
  node_ip: 106.1.8.144
  pid: 6996
  score: 2.4
  time_since_restore: 0.0002677440643310547
  time_this_iter_s: 0.0002677440643310547
  time_total_s: 0.0002677440643310547
  timestamp: 1596058694
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: cde22977

2020-07-29 22:38:14,829 ERROR trial_runner.py:520 -- Trial trainable_cde22977: Error processing event.
Traceback (most recent call last):
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 487, in _process_trial
    trial.trial_id, result=flat_result)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/suggestion.py", line 280, in on_trial_complete
    trial_id=trial_id, result=result, error=error)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/suggestion.py", line 183, in on_trial_complete
    trial_id, result=result, error=error)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/bayesopt.py", line 210, in on_trial_complete
    self._register_result(params, result)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/suggest/bayesopt.py", line 224, in _register_result
    self.optimizer.register(params, self._metric_op * result[self.metric])
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/bayes_opt/bayesian_optimization.py", line 108, in register
    self._space.register(params, target)
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/bayes_opt/target_space.py", line 161, in register
    raise KeyError('Data point {} is not unique'.format(x))
KeyError: 'Data point [1.2 2. ] is not unique'
== Status ==
Memory usage on this node: 3.4/94.1 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/55.18 GiB heap, 0.0/18.99 GiB objects
Result logdir: /home/SERILOCAL/a.hadian/ray_results/trainable
Number of trials: 30 (10 ERROR, 20 TERMINATED)
+--------------------+------------+-------+-----------+------------+--------+------------------+
| Trial name         | status     | loc   |     alpha |       beta |   iter |   total time (s) |
|--------------------+------------+-------+-----------+------------+--------+------------------|
| trainable_c8510f9e | TERMINATED |       | 0.450074  |  1.80286   |      1 |      0.000187874 |
| trainable_c8510f9f | TERMINATED |       | 0.878661  |  0.394634  |      1 |      0.00020647  |
| trainable_c8510fa0 | TERMINATED |       | 0.188066  | -1.37602   |      1 |      0.000328779 |
| trainable_c8510fa1 | TERMINATED |       | 0.0706423 |  1.4647    |      1 |      0.000178337 |
| trainable_c8510fa2 | TERMINATED |       | 0.721737  |  0.83229   |      1 |      0.000198364 |
| trainable_c8510fa3 | TERMINATED |       | 0.0256808 |  1.87964   |      1 |      0.000220537 |
| trainable_c8510fa4 | TERMINATED |       | 0.999099  | -1.15064   |      1 |      0.000213861 |
| trainable_c8510fa5 | TERMINATED |       | 0.219008  | -1.26638   |      1 |      0.000289679 |
| trainable_c8510fa6 | TERMINATED |       | 0.365786  |  0.0990257 |      1 |      0.000198603 |
| trainable_c8510fa7 | TERMINATED |       | 0.518902  | -0.835083  |      1 |      0.000193357 |
| trainable_c8510fc6 | TERMINATED |       | 1.2       |  1.78294   |      1 |      0.000273705 |
| trainable_c8510fca | TERMINATED |       | 1.2       |  2         |      1 |      0.000283718 |
| trainable_c8510fce | ERROR      |       | 1.2       |  2         |      1 |      0.000279427 |
| trainable_c8510fd2 | ERROR      |       | 1.2       |  2         |      1 |      0.000332117 |
| trainable_c91e695b | ERROR      |       | 1.2       |  2         |      1 |      0.000273466 |
| trainable_c91e695f | ERROR      |       | 1.2       |  2         |      1 |      0.00028944  |
| trainable_c91e6963 | ERROR      |       | 1.2       |  2         |      1 |      0.00026226  |
| trainable_c91e6967 | ERROR      |       | 1.2       |  2         |      1 |      0.000272512 |
| trainable_c91e696b | TERMINATED |       | 1.2       |  2         |      1 |      0.000302315 |
| trainable_ca8a433f | TERMINATED |       | 1.2       |  2         |      1 |      0.000293016 |
| trainable_ca8a4343 | ERROR      |       | 1.2       |  2         |      1 |      0.000255346 |
| trainable_ca8a4347 | ERROR      |       | 1.2       |  2         |      1 |      0.000327587 |
| trainable_cb9d483f | TERMINATED |       | 1.2       |  2         |      1 |      0.000287771 |
| trainable_cb9d4843 | TERMINATED |       | 1.19856   |  1.96286   |      1 |      0.000273228 |
| trainable_cb9d4847 | TERMINATED |       | 1.2       |  2         |      1 |      0.000294447 |
| trainable_ccdc413d | TERMINATED |       | 1.1871    |  1.97391   |      1 |      0.000268459 |
| trainable_ccdc4141 | TERMINATED |       | 1.19302   |  1.9985    |      1 |      0.000281334 |
| trainable_ccdc4145 | TERMINATED |       | 1.1791    |  1.99659   |      1 |      0.000284672 |
| trainable_cde22973 | ERROR      |       | 1.2       |  2         |      1 |      0.000271559 |
| trainable_cde22977 | ERROR      |       | 1.2       |  2         |      1 |      0.000267744 |
+--------------------+------------+-------+-----------+------------+--------+------------------+
Number of errored trials: 10
+--------------------+--------------+----------------------------------------------------------------------------------------------------------------------+
| Trial name         |   # failures | error file                                                                                                           |
|--------------------+--------------+----------------------------------------------------------------------------------------------------------------------|
| trainable_c8510fce |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_13_alpha=1.2,beta=2.0_2020-07-29_22-38-04vg2_zohw/error.txt |
| trainable_c8510fd2 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_14_alpha=1.2,beta=2.0_2020-07-29_22-38-052tama48n/error.txt |
| trainable_c91e695b |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_15_alpha=1.2,beta=2.0_2020-07-29_22-38-05x_5bun5l/error.txt |
| trainable_c91e695f |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_16_alpha=1.2,beta=2.0_2020-07-29_22-38-053o25dc2h/error.txt |
| trainable_c91e6963 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_17_alpha=1.2,beta=2.0_2020-07-29_22-38-05p2r9j_ee/error.txt |
| trainable_c91e6967 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_18_alpha=1.2,beta=2.0_2020-07-29_22-38-06hb1d2cw2/error.txt |
| trainable_ca8a4343 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_21_alpha=1.2,beta=2.0_2020-07-29_22-38-085mem3rdm/error.txt |
| trainable_ca8a4347 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_22_alpha=1.2,beta=2.0_2020-07-29_22-38-081o0161k7/error.txt |
| trainable_cde22973 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_29_alpha=1.2,beta=2.0_2020-07-29_22-38-13wv9_jr39/error.txt |
| trainable_cde22977 |            1 | /home/SERILOCAL/a.hadian/ray_results/trainable/trainable_30_alpha=1.2,beta=2.0_2020-07-29_22-38-14eaae22ph/error.txt |
+--------------------+--------------+----------------------------------------------------------------------------------------------------------------------+

Traceback (most recent call last):
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/src/test_hyperopt.py", line 27, in <module>
    search_alg=search,
  File "/home/SERILOCAL/a.hadian/projects/ontology-mapping/logmap-matcher/python/venv/lib/python3.7/site-packages/ray/tune/tune.py", line 349, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [trainable_c8510fce, trainable_c8510fd2, trainable_c91e695b, trainable_c91e695f, trainable_c91e6963, trainable_c91e6967, trainable_ca8a4343, trainable_ca8a4347, trainable_cde22973, trainable_cde22977])

The code that I tried is:

from ray.tune.suggest.bayesopt import BayesOptSearch
search = BayesOptSearch(space={"alpha": (0.001, 1.2), "beta": (-2, 2)}, metric='score') 

from ray.tune.suggest import ConcurrencyLimiter
search = ConcurrencyLimiter(search, max_concurrent=16)
analysis = tune.run(
    trainable,
    num_samples=30,
    search_alg=search,
)

print("Best config: ", analysis.get_best_config(metric="score", mode="min"))

On a separate note, I have two questions:

  1. How does the Bayesian optimizer know that it should pick the configs that minimize the function? We are only using mode="min" once the optimization is finished and we read the best config. How does the optimizer know what would have been the meaning of the "best" config?

  2. I prefer an asynchronized scheduling. I wonder what does num_samples=30, mean? Is it the total number of experiments? If so, does it mean that if we use 30 samples and 30 concurrent theads, then BayesOpt will launch all experiments at once and will not have a chance to revise them and try new experiments? So maybe num_samples >> num_threads? right?

@alihadian it looks like it converged with 1.2, 2 = "max". You'll need to do BayesOptSearch(mode="min") to minimize (sorry, forgot to include that in the past example).

I wonder what does num_samples=30, mean? Is it the total number of experiments?

It means the total number of evaluations ("trials"). You'll want concurrency to be lower than 30, yep. That's what the concurrency-limiter should do for you.

@alihadian it looks like it converged with 1.2, 2 = "max". You'll need to do BayesOptSearch(mode="min") to minimize (sorry, forgot to include that in the past example).

I wonder what does num_samples=30, mean? Is it the total number of experiments?

It means the total number of evaluations ("trials"). You'll want concurrency to be lower than 30, yep. That's what the concurrency-limiter should do for you.

Thanks for the super-useful comments.
Apparently BayesOpt does not support categorical values. Any "smart" (non-exhaustive) alternative that supports both range and categorical data and is parallelized asychronously for a non-ML system?

BTW, Ray Tune crashes for this toy example, even after specifying mode="min" for the Bayes optimizer.
Is it a BayesOpt issue or a Ray-Tune one? A bug, maybe?
If the optimizer converges to a point, then it shouldn't raise an error, no?
The program does't even finish to the last line and doesn't print the Best config.
Here is the exact code that I run:

from ray import tune

def objective(alpha, beta):
    return  ( alpha * beta)

def trainable(config):
    score = objective(config["alpha"], config["beta"])
    tune.report(score=score)

from ray.tune.suggest.bayesopt import BayesOptSearch
search = BayesOptSearch(space={"alpha": (0.001, 1.2), "beta": (-2, 2)}, metric='score', mode="min") 

from ray.tune.suggest import ConcurrencyLimiter
search = ConcurrencyLimiter(search, max_concurrent=4)
analysis = tune.run(
    trainable,
    num_samples=30,
    search_alg=search,
)

print("Best config: ", analysis.get_best_config(metric="score", mode="min"))
#df = analysis.dataframe()

Yeah, agree that this is a BayesOpt issue. I'll file an issue and try to fix it soon.

As an alternative, you can try HyperOpt or Scikit-Optimize: see the full list here -https://docs.ray.io/en/master/tune/api_docs/suggestion.html

Does that help? Please do let me know if you get it to work!

Thanks :)

Was this page helpful?
0 / 5 - 0 ratings