Pytorch-lightning: Setting of PYTHONHASHSEED has no effect

Created on 12 Jun 2020  路  4Comments  路  Source: PyTorchLightning/pytorch-lightning

馃悰 Bug

(Previously submitted here: https://github.com/PyTorchLightning/pytorch-lightning/issues/1939, but I didn't use the correct template, so now I'm resubmitting)

In https://github.com/PyTorchLightning/pytorch-lightning/blob/9045b6c599df3871da6aaaa310f62d3f1364c632/pytorch_lightning/trainer/seed.py#L32
, PYTHONHASHSEED is assigned a value in order to ensure reproducability. However, this assignment has no effect. In fact, this assignment might mislead the user or any logging software into believing that PYTHONHASHSEED has a specific value, when in fact it has another.

To see that setting PYTHONHASHSEED inside the current program has no effect, run the following two commands:

PYTHONHASHSEED=1 python -c "import os; print(hash('a'))"
PYTHONHASHSEED=1 python -c "import os; os.environ['PYTHONHASHSEED']='2'; print(hash('a'))"

The commands should output the same value, meaning that setting PYTHONHASHSEED after the process has started has no effect.

The following commands will likely output different values, also indicating that setting PYTHONHASHSEED after the process has started has no effect:

unset PYTHONHASHSEED # make sure it is not already set
python -c "import os; os.environ['PYTHONHASHSEED']='2'; print(hash('a'))"
python -c "import os; os.environ['PYTHONHASHSEED']='2'; print(hash('a'))"

To Reproduce

Steps to reproduce the behavior:

  1. Start python terminal with PYTHONHASHSEED=1 python
  2. Run
import pytorch_lightning as pl
pl.seed_everything(100)
print(hash('a'))
# >>> 8432517439229126278
  1. Start new python terminal with PYTHONHASHSEED=2 python
  2. Run
import pytorch_lightning as pl
pl.seed_everything(100)
print(hash('a'))
# >>> -8333094867672744108

Expected behavior

Expect output of hash function to be the same in both cases. The examples demonstrate that this is not possible.

Environment

* CUDA:
    - GPU:
    - available:         False
    - version:           10.2
* Packages:
    - numpy:             1.18.5
    - pyTorch_debug:     False
    - pyTorch_version:   1.5.0
    - pytorch-lightning: 0.7.6
    - tensorboard:       2.2.2
    - tqdm:              4.46.1
* System:
    - OS:                Linux
    - architecture:
        - 64bit
        - ELF
    - processor:         
    - python:            3.8.3
    - version:           #1 SMP PREEMPT Wed May 27 20:25:12 UTC 2020

help wanted question

All 4 comments

Here are some ways I can think of to solve this:

  1. Emit a warning if PYTHONHASHSEED is not 0 (0 means hash randomization is disabled)
  2. Restart the current process with PYTHONHASHSEED defined, see my snippet below. (This should be done as early as possible, to avoid non-idempotent code being executed twice.)

Personally I set PYTHONHASHSEED=0 in all of my .zshrc, but I also use the ensure_pythonhashseed function below. For pytorch-lightning, I prefer solution 1, because it is less complicated, both to implement correctly and for users to reason about.

def ensure_pythonhashseed(seed=0):
    current_seed = os.environ.get("PYTHONHASHSEED")

    seed = str(seed)
    if current_seed is None or current_seed != seed:
        print(f'Setting PYTHONHASHSEED="{seed}"')
        os.environ["PYTHONHASHSEED"] = seed
        # restart the current process
        os.execl(sys.executable, sys.executable, *sys.argv)

@Borda any thoughts about what to do about this? I can submit a PR that emits a warning when PYTHONHASHSEED is not reproducible, if that is an acceptable solution.

Instead of comparing PYTHONHASHSEED against 0, I'd check if if it set at all when seed_everything() is called. If it is, that already means that the user set it intentionally (hopefully outside of the process).

The current behavior is indeed misleading.

@awaelchli @Borda

I created a commit that removes the setting of PYTHONHASHSEED. I can create a PR for this if you would like to merge it.

Was this page helpful?
0 / 5 - 0 ratings