Optuna: Missing trial parameters with AllenNLP training

Created on 20 Jul 2020 · 3Comments · Source: optuna/optuna

I encountered a weird outcome whilst optimizing my AllenNLP model, which might be tough to reproduce. I miss some rows in trial_params. Surprisingly the table trials contains results for the missing parameters which suggests that the runs were successful.

Table `trials`

Values

Table `trial_params`

Params
Missing trial_id == 2, trial_id == 3, trial_id == 4. All other rows up to trial_id == 40 are fine.

Expected behavior

A table trial_params is filled correctly.

Environment

Optuna version: 1.5.0 (https://github.com/mateuszpieniak/optuna)
I removed only allennlp.common.params.infer_and_cast in AllenNLPExecutor. It doesn't work with null values in jsonnet. In addition, I believe that such casting is not needed anyway since jsonnet was designed in such way that a user is responsible for types casting e.g. parseInt, parseJson (https://jsonnet.org/ref/stdlib.html)
Python version: 3.6.9
OS: Ubuntu 18.04.4 LTS
AllenNLP 1.0.0
I know it wasn't supported officially in Optuna 1.5.0 & experimental, but it should work anyway as there is no modification in subsequent versions of Optuna

Steps to reproduce

I have 4 GPUs, thus I have 4 runs and I want them all to share my SQLite database.

Open 4 terminal tabs.
In each tab type export CUDA_DEVICE=0, export CUDA_DEVICE=1, and so on (https://optuna.readthedocs.io/en/stable/faq.html#how-can-i-use-two-gpus-for-evaluating-two-trials-simultaneously).
In each tab type python optuna_code.py

Reproducible examples (optional)

optuna_code.py

from optuna import Trial, create_study
from optuna.integration.allennlp import AllenNLPExecutor


def objective(trial: Trial) -> float:
    # Requires to define CUDA_DEVICE & DEBUG env variable externally to support multi GPU
    trial.suggest_categorical("POOLING", ["mean", "cls"])
    trial.suggest_float("DROPOUT", 0.0, 0.8)
    trial.suggest_float("ALPHA", 0.0, 1.0)
    trial.suggest_float("GAMMA", 0.0, 5.0)
    trial.suggest_float("LEARNING_RATE", 2e-7, 2e-5, log=True)
    trial.suggest_float("WEIGHT_DECAY", 1e-5, 1e5, log=True)

    executor = AllenNLPExecutor(
        trial=trial,
        config_file="./configs/config_name.jsonnet",
        serialization_dir=f"/experiments/optuna/{trial.number}",
        metrics="best_validation_roc_auc",
        include_package=["my_package"],
    )

    return executor.run()


if __name__ == "__main__":
    study = create_study(
        study_name="study_name"
        storage="sqlite:///results.db",
        direction="maximize",
        load_if_exists=True,
    )

    study.optimize(func=objective, n_jobs=1, n_trials=5, show_progress_bar=True)

Additional context (optional)

1) I believe that such bug is hard debug since the next time I run the code it can be fine. It looks like some race condition to me.
2) Missing rows causes hyperparameters importance to fail:

Importance

bug

Source

mateuszpieniak

All 3 comments

Thanks for the report. It seems like a severe bug, and likely a duplicate to https://github.com/optuna/optuna/pull/1498, considering your number of parallel workers and the missing rows. If possible, could you try the latest master branch and see if the problem persists?

hvy on 21 Jul 2020

@hvy Thanks for your response. I updated my fork to Optuna 2.0.0 & it seems to work for now (trial_params is correct for the first 4 runs). I will let you know when the whole optimization ends.

mateuszpieniak on 21 Jul 2020

👍1

I removed only allennlp.common.params.infer_and_cast in AllenNLPExecutor. It doesn't work with null values in jsonnet. In addition, I believe that such casting is not needed anyway since jsonnet was designed in such way that a user is responsible for types casting e.g. parseInt, parseJson (https://jsonnet.org/ref/stdlib.html)

Hey @mateuszpieniak, thank you very much for giving it a try AllenNLPExecutor.
I used infer_and_cast in AllenNLPExecutor for parsing floating points.
(e.g. script and config fails with the error TypeError: Expected embedding_dropout to be numeric.).

As you pointed out, this trick could be removed by guiding users to use parseJson.
I'll update the executor implementation.

Thank you!