Ax: Runtime increase with Choice Paramter

Created on 13 May 2020  路  6Comments  路  Source: facebook/Ax

Hello together,

i麓m working on a Ax model with 4 Range Parameters and 1 Choice Parameter (8 values).
When i麓m using the Range Parameters only, the runtime to generate a arm is good (using GP-EI). If i add the Choice Parameter it increases significant (up to 100 % - 150 %).
Later in our project we want to add more Choice Parameters so the runtime could be a big performance problem.

Is this "typical" for BO, because it uses continuous parameters in the GP or is there any possibility to solve this? Maybe with a workaround.

Thank you in advance!

Philipp

question

All 6 comments

Hi @Pzmijewski!

Thanks for bringing this to our attention. Would you mind sending a repro or code sample of what you're experiencing? It would help us a lot with the debugging process if we are able to see what you're seeing.

Thanks!

Hi @Jakepodell

we are using an api web service to do the evaluation. Further there was sensitive data in our original code, so i had to make a dummy which can be published. It is very close to the original. I hope it helps. :)

`from sty import fg, bg, ef, rs
import json
import numpy as np
import random
from ax.service.ax_client import AxClient
import time
import ax
import statistics
from ax import (
ComparisonOp,
ParameterType,
RangeParameter,
SearchSpace,
SimpleExperiment,
OutcomeConstraint,
)
from ax.modelbridge.generation_strategy import GenerationStrategy, GenerationStep
from ax.modelbridge.registry import Models

ax_client = AxClient()

create experiment

ax_client.create_experiment(
name="DummyExperiment",
parameters=[
{
"name": "range1",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "range2",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "range3",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},
{
"name": "range4",
"type": "range",
"bounds": [0, 20000],
"value_type": "int"
},

    {
        "name": "choice1",
        "type" : "choice",
        "values": ["A", "B", "C", "D", "E", "F", "G", "H"],
        "value_type": "str"
    } 
],
objective_name="cost",
minimize=True,
outcome_constraints=["serviceLevel >= 0.99"],
overwrite_existing_experiment = True

)

Dummy Evaluation Function

def evaluation(parameters):

r1 = parameters["range1"]
r2 = parameters["range2"]
r3 = parameters["range3"]
r4 = parameters["range4"]
c1 = parameters["choice1"]
costList = []
serviceLevelList = []

if c1 == "A":
    c1 = 8
elif c1 == "B":
    c1 = 6
elif c1 == "C":
    c1 = 4
elif c1 == "D":
    c1 = 2
elif c1 == "E":
    c1 = 2
elif c1 == "F":
    c1 = 4
elif c1 == "G":
    c1 = 6
elif c1 == "H":
    c1 = 8
else:
    c1 = 1

for i in range(15):

    costDet = ((50 * r1) + (20 * r2) + (30 * r3) + (5 * r4)) * c1
    costStoch = random.uniform((costDet-5), (costDet+5))
    costList.append(costStoch)

    serviceLevelDet = (r1 / 20000) + (r2 / 20000)
    serviceLevelStoch = random.uniform((serviceLevelDet-0.01), (serviceLevelDet+0.01))

    if serviceLevelStoch >= 1:
        serviceLevelStoch = 1
    else:
        serviceLevelStoch
    serviceLevelList.append(serviceLevelStoch)

costMean = sum(costList) / len(costList)
serviceLevelMean = sum(serviceLevelList) / len(serviceLevelList)
costStd = statistics.stdev(costList)
serviceLevelStd = statistics.stdev(serviceLevelList)

return {"cost": (costMean, costStd), "serviceLevel": (serviceLevelMean,serviceLevelStd)}

Experiment and Optimization Part

optimizationStart = time.time()

num_trials_total = 100
num_sobol_trials = 50

global sobol
sobol = Models.SOBOL(ax_client.experiment.search_space)

for current_trial_index in range(num_trials_total):

if current_trial_index < num_sobol_trials:
    strategytype = "sobol"
    timeStart = time.time() 
    trial = ax_client.experiment.new_trial(generator_run=sobol.gen(1))

    print("Adding Sobol trial >" + str(current_trial_index) + "<" + " at " + str(time.strftime("%Y_%m_%d_%H_%M_%S")))
    print("with parameter values: " + str(trial.arm.parameters))

else:
    strategytype = "gpei"
    timeStart = time.time() 
    data = ax_client.experiment.fetch_data()
    gpei = Models.GPEI(experiment = ax_client.experiment, data=data)
    trial = ax_client.experiment.new_trial(generator_run=gpei.gen(1))
    print("Adding GPEI trial >" + str(current_trial_index) + "<" + " at " + str(time.strftime("%Y_%m_%d_%H_%M_%S")))
    print("with parameter values: " + str(trial.arm.parameters))

trial.mark_running(no_runner_required = True)
ax_client.complete_trial(trial_index = current_trial_index, raw_data=evaluation(trial.arm.parameters))
timeEnd = time.time()
TimeTrial = timeEnd - timeStart

print("Trial completed in: " + str(round(TimeTrial, 4)) + " Seconds")

optimizationEnd = time.time()
optimizationTime = optimizationEnd - optimizationStart
print("Runtime of Optimization " + str(round(optimizationTime, 4))+ " Seconds")

show Results

best_parameters, values = ax_client.get_best_parameters()
print("Best Parameter from Ax:")
print(best_parameters)
print("corresponding Responses")
print(values)
`

Hi!

Thanks for the code sample. Off the top of our heads, we don't see any obvious reason for this slowdown, so we are going to have to investigate. We've added this to our queue and will be looking into it soon!

Thanks for the support!

@Pzmijewski

When a categorical choice parameter with K choices like yours is included in optimization, it is transformed using OneHotEncoder into K different parameters (in which the active choice parameter is set to 1 and all others are set to 0). This adds K parameters to the search space, substantially increasing the time to fit the model and generate new candidates.

Performance and speed will degrade substantially with more than 20 total parameters.
We have three recommendations:

  1. Use sobol optimization instead. We often suggest a heuristic of using sobol search whenever the sum of the count of categories across all categorical parameters exceeds the count of continuous range parameters.
  2. If your categories have some kind of order relationships, specify that by adding "is_ordered": True to the parameter dict. This will do more to improve model quality than to improve generation speed.
  3. If your categories can be represented by an embedding in continuous space, then passing them then using that latent space as part of your search space directly should improve performance and speed.

We are working on methods to automatically infer latent spaces of categories to get the advantages of approach #3, but this is fairly far off right now.

Closing, since as per the above investigation we learned that this was expected behavior, and is not on our roadmap to address in the short term. Thanks @2timesjay !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

grmaier picture grmaier  路  9Comments

sczhang870330 picture sczhang870330  路  12Comments

avimit picture avimit  路  28Comments

HanGuo97 picture HanGuo97  路  17Comments

Leonhalt3141 picture Leonhalt3141  路  10Comments