Optuna: Race Condition For First Run

Created on 6 Apr 2020  路  4Comments  路  Source: optuna/optuna

When spawning multiple studies with the same postgres backend,
the script crashes because each process tries to create the db table.

Expected behavior

db lock, no crash.

Environment

  • Optuna version: 1.2.0
  • Python version: 3.8

Error messages, stack traces, or logs

..
sqlalchemy.exc.ProgrammingError: ... relation "alembic_version" already exists
..
sqlalchemy.exc.IntegrityError: ... duplicate key value violates unique constraint "pg_type_typename_nsp_index"
..

Steps to reproduce

[delete optuna db tables]
python exp.py exp1 &
python exp.py exp2 &
python exp.py exp3 &

Additional context (optional)

only tried postgres

bug

Most helpful comment

Thanks, it solved my problem.

All 4 comments

Hi @elbaro. To avoid this problem, you need to run database migrations before running exp.py.

  1. Create a database from a command-line interface: $ optuna create-study --storage ... --study-name ....
  2. Load a study using optuna.load_study() on your python program.

There are no simple solutions to run database migrations on multiple workers. So I think it's better to document this concurrent problem. What do you think?

The exp above meant a study, not a trial. Each line does its own optuna.study.create_study.
Currently I run a small dummy study in advance.

python exp.py study=dummy # db init
python exp.py study=study1 &
python exp.py study=study2 &
..

Should I run the migrate cmd for each study? Other than race conditions, optuna.create_study just works fine to init the database.

I see. I misunderstood.

Should I run the migrate cmd for each study?

No. It's OK just migrate once. You can migrate your database by using optuna storage upgrade --storage YOUR_DATABASE without creating a study. So I think the following steps work as you expected.

$ optuna storage upgrade --storage "postgresql://scott:tiger@localhost:5432/mydatabase"
$ python exp.py exp1 &
$ python exp.py exp2 &
$ python exp.py exp3 &

Thanks, it solved my problem.

Was this page helpful?
0 / 5 - 0 ratings