Hi,
I'm using Ax.client with default bayesian optimization model (GPEI) to search for hyperparameters. The search space contains 5 range [0,1] parameters. As the searching process goes, the memory consumption becomes gettting out of hand and eventually OOM (16GB memory w/ 8GB Zram enabled) after testing about ~50 arms.
Here are my questions.
Thanks very much!
@showgood163, thank you for raising the issue! May I ask for the version numbers of Ax and BoTorch in your setup and a gist / notebook of the code you ran?
cc @eytan, @Balandat
Also, how many metrics are you modeling? And what鈥檚 the optimization config?
Thanks for your attention.
I'm using ax-platform 0.1.6 and botorch 0.1.4 w/ pytorch 1.3.1.
I'm optimizing 1 objective, which is an average of 4 accuracys and 4 F1s.
Also I'm feeding 16 metrics in the raw_data of ax.complete_trial, which are 4 accuracys, 4 F1s, 4 unweithged F1s, 3 averages (of acc, F1 and unweighted F1) and loss.
The optimization config is like:
ax.create_experiment(
name=args.name,
parameters=hyper_param_search_lists,
objective_name="f1+acc",
minimize=False,
parameter_constraints=None,
outcome_constraints=None)
where hyper_param_search_lists is 5 range [0,1] parameters.
The example code snippet is here.
I'm feeding 16 metrics in the raw_data of ax.complete_trial
Ah ok, that may be the problem - we currently have an inefficiency in our setup where during candidate generation optimization we predict all modeled outcomes (rather than just the ones required for optimization). This will significantly increase the memory footprint (and computation time), and with 16 modeled outcomes this will start to hurt pretty quickly.
We're currently working on fixing this, in the mean time I would suggest that you use a model with fewer outcomes for the optimization. However, I see that you're using the service API, so that may not be as straightforward. Maybe @lena-kashtelyan has thoughts on this?
An alternative would be to use the developer API if that's a reasonable thing to do in your case.
Thanks for your fast responding! I'll try to reduce the amount of data I feed with raw_data and see what happened.
Here are questions that may be out of the topic.
I feed that much of raw_data in order to use the analysis utilities provided by Ax. Is it possible to use these analysis utilities on data didn't included in raw_data?
I'm using the service API because I want to load & save the search process and when I search save
this method seems reasonable to use. Is there any docs/tutorials to follow when I change to the developer API?
I feed that much of raw_data in order to use the analysis utilities provided by Ax. Is it possible to use these analysis utilities on data didn't included in raw_data?
So in principle there is no issue with having a full model to be used with the analysis utilities, and a leaner model used only for the optimization (any metric that does not appear in the optimization config will be ignored during candidate generation). This is easy to do in the developer API (see API Docs), but it may not be in the service API (deferring to @lena-kashtelyan for this).
I'm using the service API because I want to load & save the search process and when I search save
this method seems reasonable to use. Is there any docs/tutorials to follow when I change to the developer API?
You can save the experiment object you define as part of the developer API using the storage module. This will also save all the trials generated during the experiment. The downside is that you'll have to a little more legwork to generate new candidates.
@showgood163, @Balandat, if I correctly understand the issue, it certainly can be addressed by using the Dev API. The use case would be very similar to the one shown in the Dev API tutorial, and to attach the data for metrics that are not involved in the optimization, one should be able to use simple_experiment.attach_data [1], then use plot_contour as shown in the Visualization tutorial.
If Service API is more convenient, then one thing we can do is:
ax_client.experiment) and use it to create visualizations as per the Visualization tutorial [2] or generate one more trial after adding all the additional data (this will set the Ax client model to one that models all the data) and use the usual Service API visualization. I think the latter way may be somewhat easier and require less changes to the existing code.
[1] Would need to construct a Data object to pass to attach_data; this essentially just entails constructing the underlying dataframe and data_from_evaluations utility might come in handy.
[2] Note that in that tutorial a SimpleExperiment would be used; in the case of the Service API, you will get just an Experiment, so instead of simple_experiment.eval(), you would do experiment.fetch_data().
Thank you @lena-kashtelyan @Balandat !
We're currently working on fixing this, in the mean time I would suggest that you use a model with fewer outcomes for the optimization.
I rerun an experiment with 8 hparams and 3 metrics, ax.Client.get_next_trial() has ~3.5G peak memory consumption for 134-th trials, which seems good. And since the problem is solved, I think I should close the issue.
You can save the experiment object you define as part of the developer API using the storage module. This will also save all the trials generated during the experiment. The downside is that you'll have to a little more legwork to generate new candidates.
Yes, I think I just didn't spend enough time to look into it.
So in principle there is no issue with having a full model to be used with the analysis utilities, and a leaner model used only for the optimization.
either extract the experiment (ax_client.experiment) and use it to create visualizations as per the Visualization tutorial [2] or generate one more trial after adding all the additional data (this will set the Ax client model to one that models all the data) and use the usual Service API visualization.
In the end, I choose to use ax.client.attach_trial to attach existing experiment data and run ax.client.get_next_trialonce before visualizing, which means the memory bottleneck now exists on the visualization/analysis part, not the experiment part.