Ax: Failed and abandoned trials: indicating trial failures and and excluding certain parameter values from future trials

Created on 26 Aug 2020  Â·  22Comments  Â·  Source: facebook/Ax

I'm trying to optimize an evaluation function which consists of an optimization routine.
For certain combinations of the parameters, the routine does not converge and results in an invalid result.
I am currently using the GPEI optimizer and developer API with the SimpleExperiment class.
I'd like to be able to indicate this when this happens and also somehow exclude those values from future trials, but wasn't sure how to do this.
I think #329 and #176 are related, but they don't quite answer the question that I am looking for.
I couldn't really find any documentation about how to use mark_abandoned, which seems like what I need based on the explanation.
Could I receive some suggestions for what to look at and how to implement my desired functionality?
Thanks.

question

Most helpful comment

@jangkj09 you're right actually! I was getting confused about the difference between SimpleExperiment and regular Experiment. My example is really meant more for Experiment. SimpleExperiment does the trial evaluation automatically behind-the-scenes when you call eval, which makes it a bit harder to accomplish what you want. You probably want to use either regular Experiment or the Service API.

All 22 comments

Hi @jangkj09 ! You might have better luck using our Service API. I'd recommended checking out the tutorial here. And then, the part you'd modify is this:

     for i in range(num_trials):
            trial_params, trial_index = ax_client.get_next_trial()

            # run the trials, but skip bad ones
            try:
                data = evaluate_params(trial_params)
            except Exception:
                ax_client.abandon_trial(trial_index=trial_index)
                continue

            ax_client.complete_trial(trial_index=trial_index, raw_data=data)

Let me know if this helps!

Thanks, I will try this out. I don't need it quite yet, but how are you supposed to define a custom generator model using the Service API?
I see the option for specifying generation_strategy but couldn't find a good example as to how to set this.
Also, just out of curiosity, is it possible to have something like this using the Developer API. I'd like to use a custom BoTorch model, but I could only find examples of using the Developer API.

Thanks.

I'll defer to @lena-kashtelyan on parts of this question, but as for how to do this using the Developer API, you can do something quite similar:

for i in range(num_trials):
    gpei = Models.BOTORCH(experiment=exp, data=exp.eval())
    trial = exp.new_trial(generator_run=gpei.gen(1))
     try:
          data = evaluate_trial(trial)
          trial.mark_completed()
          experiment.attach_data(data)
      except Exception:
          trial.mark_abandoned()
          continue

Something like that should work! I haven't tested this though, so let me know if you run into any issues :)

To set up custom model in the Service API, you will need to make your own generation strategy and pass it to AxClient as the generation_strategy kwarg on instantiation. You can use this comment or the choose_generation_strategy function as an example of how a generation strategy is constructed. A tutorial on generation strategy is currently in the works, so stay tuned!

How custom is your model? I can provide further help with questions / customization and https://github.com/facebook/Ax/issues/293 might also be of some help.

Thanks. @lena-kashtelyan I'll go through some of these comments and see if I have any further questions. The model isn't that custom and it's just a GP model I fit with previous experimental data.

@ldworkin This is from my lack of understanding of the Developer API and the concepts in Ax, but in the sample code you have provided, if you don't explicitly call evaluate_trial() will the new trial be evaluated when there is a call to exp.eval()? Or is it being evaluated when you make the call exp.new_trial(generator_run=gpei.gen(1))?
I was trying out some explicit parameter values with exp.new_trial().add_arm() and it seemed to be evaluating immediately.

@jangkj09 you're right actually! I was getting confused about the difference between SimpleExperiment and regular Experiment. My example is really meant more for Experiment. SimpleExperiment does the trial evaluation automatically behind-the-scenes when you call eval, which makes it a bit harder to accomplish what you want. You probably want to use either regular Experiment or the Service API.

+1 to @ldworkin above, switching to regular Experiment or Service API should address your issue, @jangkj09! SimpleExperiment really is only meant for quick experimentation in notebooks where nothing unusual (like invalid configurations) can happen.


The model isn't that custom and it's just a GP model I fit with previous experimental data.

Just FYI, this is something that you can achieve via ax_client.attach_trial in Service API (this is documented in "adding custom trials part" of this section of the Service API tutorial). Just make sure to first attach all pre-existing data, then start generating new trials on the new experiment, so you would do something like:

ax_client.create_experiment(...)
for _ in range(len_preexisting_data):
    ax_client.attach_trial(...)  # Inform experiment of parameter values, for which you have data.
    ax_client.complete_trial(...)  # Log the data for those parameter configurations.

for _ in range(num_new_trials):
    ax_client.get_next_trial(...)
    ax_client.complete_trial(...)

If you choose this path, you won't need to set a custom generation strategy –– it will be chosen for you and trained with pre-existing data before it starts generating new trials.

You can also achieve this via the Developer API, by similarly adding custom trials to the experiment (experiment.new_trial.add_arm(Arm(custom_parameter_configuration))) and then attaching data for it experiment.attach_data(...)).

@lena-kashtelyan @ldworkin Thanks for all your help so far, it has been very informative. I had some more questions:

  1. If I'm using the Service API, I understand that I need to pass in a generation strategy and the examples, but is there a way to simply specify one of the default generation strategies (i.e. one that would've automatically been chosen, but just explicitly selected by me)? I'm not fully understanding how to define a generation strategy and the components that are necessary.

  2. What exactly is the difference between the Developer API and Service API? I understand that you have to define more components (illustrated in the "Building Blocks of Ax"), but it's still not clear to me when someone should use the Service API and when one should use the Developer API. The Service API seems to have a lot of flexibility, but some steps (such as building the model) seems to be much more straightforward in the Developer API.

  3. When using Experiment, I am confused about the difference between running the trial and evaluating the trial? Similarly, if I don't have any complicated deployment logic, what is required of the Runner (step 5 of "Building Blocks of Ax" tutorial)

  4. In the simple code example above in https://github.com/facebook/Ax/issues/372#issuecomment-681086372, isn't exp.eval() only possible for SimpleExperiment? It doesn't seem like the base class Experiment has an eval() function according to the API documentation.

Thank you so much!

  1. I think what @lena-kashtelyan is saying is that, for your use case, you actually _won't_ need to pass in a generation strategy. We'll automatically choose it for you and train it with the pre-existing data (using her code example above). However, if you _want_ to pass in a custom generation strategy, the best documentation we have for that at the moment is this comment here.
  2. The Developer API is meant to give you more custom control over the whole flow. In most cases, the Service API is sufficient and is a better choice, because it's simpler and easier to use. You actually can pass custom models to the Service API, by including them in your custom generation strategy -- see https://github.com/facebook/Ax/issues/293 for an example. I think the main problem here is that we don't have a ton of great documentation on how to do "custom" things with the Service API, but it's all still possible.
  3. Because you don't have complicated deployment logic, I think I'd shy away from using Experiment. It sounds to me like the Service API is going to be a better fit. In some cases, there is a difference between running and evaluating the trial -- say when you're doing online experimentation, and you need to deploy your experiment, and then collect the data for it. But in your case, since there's no difference, using the Developer API is probably overkill. If you wanted to, you could just use our dummy SyntheticRunner, that's basically a no-op.
  4. You're right, exp.eval() is only for SimpleExperiment. For regular Experiment, you need to define Metrics, along with their fetch_data functions. Again, my growing sense is that you'd be best off using the Service API and avoiding all of this!

Building off of @ldworkin's answer, just a couple additions:

  1. There are two ways to control what generation strategy is chosen:
    a) To actually pass the generation strategy to AxClient,
    b) To pass choose_generation_strategy_kwargs to create_experiment (as shown in bottom section of the tutorial: https://ax.dev/tutorials/gpei_hartmann_service.html#Service-API-Exceptions-Meaning-and-Handling), and these can be any of these kwargs: https://github.com/facebook/Ax/blob/master/ax/modelbridge/dispatch_utils.py#L115-L127. So that doesn't give full control, but it allows to set a bunch of important settings.

  2. I would define the use case for Service API as follows: it is basically for cases where someone wants to control deployment and / or evaluation of trials entirely on their own, without implementing Ax runners or metrics, such that they just obtain a parameterization, evaluate it however they will in custom script / notebook, then log back the data for the trial.

  3. SimpleExperiment is a somewhat legacy abstraction, and Service API is a more full replacement: it's similar to SimpleExperiment in that it doesn't require specifying deployment and data-fetching logic in any formal way, but it allows much more custom functionality (like trial abandonment you need for you use case) and is better documented.

Thanks so much! This clarifies a lot to me :)

Just a few more things:
@ldworkin I was looking up the evaluate_params() function, but it doesn't seem to be in the documentation. Is this the evaluate() function that I'm supposed to define in the tutorial?

@lena-kashtelyan I can't actually follow the link: https://fburl.com/znk3qerb It says it's missing the page.

@jangkj09, fixed the link, sorry about that!

Re: evaluate_params function, could you specify where you saw that?

Oh! I think @ldworkin just meant to suggest that as a placeholder for any arbitrary function that evaluates a given parameter configuration. See this section of the Service API tutorial for an example.

Thanks, I'm getting an error saying that 'AxClient' object has no attribute 'abandon_trial'. I see it in the API, but not sure why this error is happening.

Please post a code snippet that we can review in that case, since that's quite unexpected.

Sure, I'll post some code soon, but could it be a version issue? I see that the latest is 0.1.14, but the version I have installed is 0.1.9. I tried updating pip and also tried to reinstall specifying 0.1.14, but it says that it is unavailable.

FYI
Ubuntu 18.04
Python 3.6
Pip 20.2.2

Sorry, stupid mistake on my part. I guess it requires Python 3.7...

Oh, it's certainly a version issue! And yes, our recent versions require Python 3.7.

Thanks, I was able to implement a simple setup using the service API, but I had some more questions. Please let me know if any of these should be a separate issue:

  1. What exactly is happening when I indicate a trial as abandoned that it is able to know not to predict those parameter values and what is the difference from when I use log_trial_failure as described in the tutorial?

  2. Does the model (from which suggestions are made) account for parameter values which fail? Or rather, is there a way to make the model avoid regions around failures, but not completely? I'm running into an issue where I will run 1000 iterations and after the initial Sobol sequence generation, all the values from GPEI Model result in abandoned trials.

  3. If I define a search space, mixing integer range parameters and float range parameters, what would be determined as the generation strategy in this case? I was looking at the definition of choose_generation_strategy and I couldn't figure out what would be chosen in this case.

Re: difference between abandoning trials and marking them as failed –– when a trial is 'running', it's included in 'pending points' that are passed to the model to indicate that those points should not be re-suggested as they are currently being evaluated. When a trial is 'abandoned', it remains in 'pending points' forever, and when it is marked 'failed', it is removed from pending points. That is because we treat 'failure' as some infrastructural failure during evaluation, which will not necessarily happen again if the same point is re-ran. 'Abandonment', on the other hand, we treat as final decision that a given point should not be part of the experiment.

Re: choose_generation_strategy for a search space of int- and float-valued range parameters –– this helper determines whether Sobol + GPEI or just Sobol will be used. As you can see in that function, Sobol + GPEI will not be used if any of the following are true:

  1. Number of continuous parameters (int- or float-valued range parameters) is less than the sum of choices in discrete parameters (choice parameters). For instance, if your search space has 2 range parameters and 1 choice parameter with values ["a", "b"] (so only can have two values), Sobol + GPEI will be used, but if the choice parameter has values ["a", "b", "c"], Sobol will be used.
  2. Number of total trials to run is known in advance (this is not the case for you unless you explicitly pass num_trials as part of choose_generation_strategy_kwargs) and it's enough to try out all possible points in the search space.

With this, I'm closing this issue since we answered to all questions in it re: how to abandon trials, what abandoning trials means, etc.


Re: whether there is a way to make the model avoid certain regions and your experiment where GPEI continuously suggests infeasible trials –– this should definitely be a separate issue. If you would like to get help with it, please start one and include 1) code snippet for how you set up your experiment and run trials, 2) what the data for your experiment looks like and what the GPEI suggestions were, so the folks who work on our methodology can help you.

Thanks for all your help!

Was this page helpful?
0 / 5 - 0 ratings