I tried to add a prior to the Ax service by feeding it data for previous completed trials by calling attach trial and then complete trial. But I noticed that even if I did that, the first few trials to be generated were still the same.
Even when I added the exact same parameter configuration (or close to exact) and results that were generated in a previously run Ax optimization job, the new Ax job would still generate the same parameters as the old optimization job. This is a lot of wasted compute on my end, and I was wondering if there was a way to avoid this?
@aagarwal1999, are you fixing the random seed?
Hi Ankit, can you please include some minimal example to repro?
On Mon, Aug 19, 2019 at 11:40 AM Lena Kashtelyan notifications@github.com
wrote:
@aagarwal1999 https://github.com/aagarwal1999, are you fixing the
random seed?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/facebook/Ax/issues/154?email_source=notifications&email_token=AAAW34LDIOLPL67DJ6B26D3QFLSJTA5CNFSM4IND7SN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4T5FGI#issuecomment-522703513,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAW34KK6ZQSYNZLLEUBCXLQFLSJTANCNFSM4IND7SNQ
.
Yes I am fixing the random seed for reproducibility.
Below is the example I ran. You can see even when I attach values of 4, 18, 2 before I run get_next_trial and rerun the experiment, the trials that come out are also 4, 18, and 2 for the first three trials. This happens with or without me setting torch.manual_seed before every call of generate_trial.
from ax.service.ax_client import AxClient
from ax.service.utils.dispatch import choose_generation_strategy
import torch
ax_client = AxClient()
name = "test_ax"
parameters = [{"name": "param_1", "type": "range", "bounds": [1, 100], "log_scale": True}]
objective_name = "objective_1"
minimize = True
ax_client.create_experiment(name=name, parameters=parameters, objective_name=objective_name, minimize=minimize)
ax_client.generation_strategy = choose_generation_strategy(search_space=ax_client.experiment.search_space, random_seed=42)
params = []
for _ in range(3):
torch.manual_seed(1000)
param, trial_id = ax_client.get_next_trial()
params.append(param["param_1"])
print(param)
ax_client = AxClient()
ax_client.create_experiment(name=name, parameters=parameters, objective_name=objective_name, minimize=minimize)
ax_client.generation_strategy = choose_generation_strategy(search_space=ax_client.experiment.search_space, random_seed=42)
for param in params:
print(f"Attaching trial with param value {param} and result {param * 2}")
new_param, trial_id = ax_client.attach_trial({"param_1": param})
ax_client.complete_trial(trial_id, raw_data=param * 2)
for _ in range(3):
torch.manual_seed(1000)
param, trial_id = ax_client.get_next_trial()
print(param)
The reason why you see this behavior is that the first 5 trials are generated using Sobol sequence (quasi-randomly), so when you fix the random seed, the first few trials still come out the same. Do you need both the reproducibility and to be able to use the trials from the previous optimizations?
Not really, I can work around it by having the user set a seed as a part of the configuration. It just seems to me that if I am trying to give a prior to ax in hopes of improving optimization time, the first five trials should be generated from a more optimized search space based on the information given to the client.
Anyway, if this is not a big priority right now, this is not a big deal. Thanks for your help!
@aagarwal1999, you could actually achieve that behavior by passing in a custom generation strategy to AxClient. There are two kinds of generation strategies you could use:
1) Generation strategy with only the Bayesian optimization step, without the quasi-random initialization. You would just need to make a generation strategy, similar to that returned by choose_generation_strategy (referenced in https://github.com/facebook/Ax/issues/151) but with only one step, the Models.GPEI step. Note that if you went with this solution, you would need to make sure you do always attach and complete a few trials before generating new ones from AxClient via get_next_trial (since the model you are using in this case needs some training data right away). It's also only really good to do this if the previous trials were generated through Ax (or in some other principled way), so that they are a reasonable sample of the search space.
2) Generation strategy, in which the initial quasi-random trials start later in the Sobol sequence (so they wouldn't be repeated). Your generation strategy would still be similar to that returned by choose_generation_strategy (referenced in https://github.com/facebook/Ax/issues/151) but with an additional kwarg here, init_position=N, where N is how many trials you will be attaching. You could also reduce the number of Sobol trials done in this case to, say, 2, since you know you are attaching trials, previously generated by Ax, so there will be enough training data (generated in a principled way) for the BayesOpt model to learn.
Do these solutions help?
We are also currently looking into making sure that the trials, which overlap with the ones attached by users, are not suggested again quasi-randomly, and I will update this thread if we decide to implement a fix for it. It's just never come up before as folks weren't fixing the seed, so thank you for bringing it up : )
For now the random seed work around I suggested will suffice, since the attached trials could be anything (or nothing) and I do not know how to tell if the given parameters are a "reasonable sample" of our search space without diving too much into Ax's codebase. But please let me know if that fix is implemented.
Your response summarizes very well why we didn't implement special behavior out of the box, since 1) attached trials may be anything or nothing, 2) unless trials were generated by Ax and for an experiment with the same setup (search space + evaluation setup), it may be hard to tell whether they are a reasonable sample. However, the repeated trials issue comes up only if those trials were actually generated by Ax, so it we will still look into possibly adding a fix, since in those cases the attached trials do constitute a reasonable sample.
Also, I just realized that this may not be obvious: _those attached trials are actually used as training data for BayesOpt, so information in them is used to improve optimization._ The reason why the initial 5 trials are the same is that they are generated quasi-randomly, but starting with the 6th trial generation, Bayesian optimization kicks in and utilized the data from the attached trials.
Hi @aagarwal1999, if you are adding any additional datapoints to Ax, it should be from the exact same task—otherwise it will really throw off the GP. You can always use custom models with Ax, so if you wanted to use a different mean function of HP prior, you could do that. We also have an example of meta-learning in BoTorch, which could be adapted to work with Ax. With this algorithm, you can use any number of possibly related runs.
Another thing to note is that if you did want to use any prior experiments for improving the efficiency of your optimization, you'll probably need way more than 5 previous points, since one generally needs at least at least 2-3 points per dimension to start getting a sensible GP model fit.
Most helpful comment
Your response summarizes very well why we didn't implement special behavior out of the box, since 1) attached trials may be anything or nothing, 2) unless trials were generated by Ax and for an experiment with the same setup (search space + evaluation setup), it may be hard to tell whether they are a reasonable sample. However, the repeated trials issue comes up only if those trials were actually generated by Ax, so it we will still look into possibly adding a fix, since in those cases the attached trials do constitute a reasonable sample.
Also, I just realized that this may not be obvious: _those attached trials are actually used as training data for BayesOpt, so information in them is used to improve optimization._ The reason why the initial 5 trials are the same is that they are generated quasi-randomly, but starting with the 6th trial generation, Bayesian optimization kicks in and utilized the data from the attached trials.