Hello together,
i麓m working on a Ax model with 4 Range Parameters and 1 Choice Parameter (8 values).
When i麓m using the Range Parameters only, the runtime to generate a arm is good (using GP-EI). If i add the Choice Parameter it increases significant (up to 100 % - 150 %).
Later in our project we want to add more Choice Parameters so the runtime could be a big performance problem.
Is this "typical" for BO, because it uses continuous parameters in the GP or is there any possibility to solve this? Maybe with a workaround.
Thank you in advance!
Philipp
Hi @Pzmijewski!
Thanks for bringing this to our attention. Would you mind sending a repro or code sample of what you're experiencing? It would help us a lot with the debugging process if we are able to see what you're seeing.
Thanks!
Hi @Jakepodell
we are using an api web service to do the evaluation. Further there was sensitive data in our original code, so i had to make a dummy which can be published. It is very close to the original. I hope it helps. :)
`from sty import fg, bg, ef, rs
import json
import numpy as np
import random
from ax.service.ax_client import AxClient
import time
import ax
import statistics
from ax import (
ComparisonOp,
ParameterType,
RangeParameter,
SearchSpace,
SimpleExperiment,
OutcomeConstraint,
)
from ax.modelbridge.generation_strategy import GenerationStrategy, GenerationStep
from ax.modelbridge.registry import Models
ax_client = AxClient()
ax_client.create_experiment(
name="DummyExperiment",
parameters=[
{
"name": "range1",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "range2",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "range3",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "range4",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "choice1",
"type" : "choice",
"values": ["A", "B", "C", "D", "E", "F", "G", "H"],
"value_type": "str"
}
],
objective_name="cost",
minimize=True,
outcome_constraints=["serviceLevel >= 0.99"],
overwrite_existing_experiment = True
)
def evaluation(parameters):
r1 = parameters["range1"]
r2 = parameters["range2"]
r3 = parameters["range3"]
r4 = parameters["range4"]
c1 = parameters["choice1"]
costList = []
serviceLevelList = []
if c1 == "A":
c1 = 8
elif c1 == "B":
c1 = 6
elif c1 == "C":
c1 = 4
elif c1 == "D":
c1 = 2
elif c1 == "E":
c1 = 2
elif c1 == "F":
c1 = 4
elif c1 == "G":
c1 = 6
elif c1 == "H":
c1 = 8
else:
c1 = 1
for i in range(15):
costDet = ((50 * r1) + (20 * r2) + (30 * r3) + (5 * r4)) * c1
costStoch = random.uniform((costDet-5), (costDet+5))
costList.append(costStoch)
serviceLevelDet = (r1 / 20000) + (r2 / 20000)
serviceLevelStoch = random.uniform((serviceLevelDet-0.01), (serviceLevelDet+0.01))
if serviceLevelStoch >= 1:
serviceLevelStoch = 1
else:
serviceLevelStoch
serviceLevelList.append(serviceLevelStoch)
costMean = sum(costList) / len(costList)
serviceLevelMean = sum(serviceLevelList) / len(serviceLevelList)
costStd = statistics.stdev(costList)
serviceLevelStd = statistics.stdev(serviceLevelList)
return {"cost": (costMean, costStd), "serviceLevel": (serviceLevelMean,serviceLevelStd)}
optimizationStart = time.time()
num_trials_total = 100
num_sobol_trials = 50
global sobol
sobol = Models.SOBOL(ax_client.experiment.search_space)
for current_trial_index in range(num_trials_total):
if current_trial_index < num_sobol_trials:
strategytype = "sobol"
timeStart = time.time()
trial = ax_client.experiment.new_trial(generator_run=sobol.gen(1))
print("Adding Sobol trial >" + str(current_trial_index) + "<" + " at " + str(time.strftime("%Y_%m_%d_%H_%M_%S")))
print("with parameter values: " + str(trial.arm.parameters))
else:
strategytype = "gpei"
timeStart = time.time()
data = ax_client.experiment.fetch_data()
gpei = Models.GPEI(experiment = ax_client.experiment, data=data)
trial = ax_client.experiment.new_trial(generator_run=gpei.gen(1))
print("Adding GPEI trial >" + str(current_trial_index) + "<" + " at " + str(time.strftime("%Y_%m_%d_%H_%M_%S")))
print("with parameter values: " + str(trial.arm.parameters))
trial.mark_running(no_runner_required = True)
ax_client.complete_trial(trial_index = current_trial_index, raw_data=evaluation(trial.arm.parameters))
timeEnd = time.time()
TimeTrial = timeEnd - timeStart
print("Trial completed in: " + str(round(TimeTrial, 4)) + " Seconds")
optimizationEnd = time.time()
optimizationTime = optimizationEnd - optimizationStart
print("Runtime of Optimization " + str(round(optimizationTime, 4))+ " Seconds")
best_parameters, values = ax_client.get_best_parameters()
print("Best Parameter from Ax:")
print(best_parameters)
print("corresponding Responses")
print(values)
`
Hi!
Thanks for the code sample. Off the top of our heads, we don't see any obvious reason for this slowdown, so we are going to have to investigate. We've added this to our queue and will be looking into it soon!
Thanks for the support!
@Pzmijewski
When a categorical choice parameter with K choices like yours is included in optimization, it is transformed using OneHotEncoder into K different parameters (in which the active choice parameter is set to 1 and all others are set to 0). This adds K parameters to the search space, substantially increasing the time to fit the model and generate new candidates.
Performance and speed will degrade substantially with more than 20 total parameters.
We have three recommendations:
We are working on methods to automatically infer latent spaces of categories to get the advantages of approach #3, but this is fairly far off right now.
Closing, since as per the above investigation we learned that this was expected behavior, and is not on our roadmap to address in the short term. Thanks @2timesjay !