dvc run unexpectedly modifies PATH before running commands

Created on 17 Sep 2019  Β·  17Comments  Β·  Source: iterative/dvc

Steps to reproduce:

  1. pyenv install miniconda3-latest
  2. pyenv shell miniconda3-latest
  3. conda install dvc==0.59.2
  4. conda create -n testenv python=3.7
  5. mkdir dvc-test && cd dvc-test
  6. dvc init && dvc run -f tmp.dvc 'echo $PATH'

Expected output:

  • Conda env /bin
  • Pyenv shim /bin
  • Conda base /bin
  • Pyenv shim /bin
  • etc

Actual output:

  • Conda base /bin
  • Pyenv shim /bin
  • Conda env /bin
  • Pyenv shim /bin
  • Conda base /bin
  • Pyenv shim /bin
  • etc

Somehow part of the PATH from wherever DVC is installed is getting prepended to PATH before dvc run-ing something. This potentially breaks code in the project.

Note that installing and running DVC with Pipx instead of Conda does not result in the same problem. The adverse interaction appears to be Conda-specific (although there are many combinations of environments I haven't yet tried). This particular Pipx installation is not being managed by Conda.

Discord conversation: https://discordapp.com/channels/485586884165107732/485596304961962003/623513705145040919

bug p0-critical research

Most helpful comment

Great work! 🎊 Thanks everybody for taking the time!

All 17 comments

It's happening to other users as well (discussion with Benjamin on Discord). Symptoms are very similar - zsh (a regular one), PATH is modified. Not conda, virtualenv is being used. OS - Mac.

@shcheklein in their case, is DVC installed in the "parent" Python environment, or in a separate virtualenv? The problem might lie in something that has to do with the parent/child relationship.

Also, did they use Homebrew to install Python? Brew is another common factor here, and more likely to cause problems than Zsh, since Brew does its own layer of symlinking.

My admittedly convoluted setup:

Linuxbrew
β”œβ”€ Pyenv
β”‚Β  └─ Conda                        <- PATH is broken when DVC is installed here
β”‚     └─ Active conda environment  <- PATH is OK when DVC is installed here
└─ Python
   └─ Pipx-managed Virtualenv      <- PATH is OK when DVC is installed here

@gwerbin Thanks! I've asked Benjamin to take a look and share more details.

@shcheklein in their case, is DVC installed in the "parent" Python environment, or in a separate virtualenv? The problem might lie in something that has to do with the parent/child relationship.

I have tried two setups, and both fail the sense that:

  1. The error ImportError: No module named pandas is returned.
  2. dvc run -o test 'which python > test' outputs /usr/local/bin/python in the test file, where it should point to python in the virtualenv.

Setup 1

Homebrew
   └─ DVC (`/usr/local/bin/dvc`)
   └─ Virtualenv + Python (`/usr/local/bin/{python,virtualenv}`)
             └─Active virtualenv environment
                       └─  Pandas

Setup 2

Homebrew
   └─ Virtualenv + Python (`/usr/local/bin/{python,virtualenv}`)
             └─Active virtualenv environment
                       └─  DVC
                       └─  Pandas

Also, did they use Homebrew to install Python? Brew is another common factor here, and more likely to cause problems than Zsh, since Brew does its own layer of symlinking.

Yes, Python was installed by Homebrew. (FYI: the Python interpreter that comes with the latest version of macOS (Mojave, version 10.14.6) is 2.7.10 and is 4.5 years old. I figure most people using Python on macOS will have shadowed this outdated version with a more recent one.)

@shcheklein and @efiop asked me to share the output of a few commands on the Discord channels and perhaps it helps if I share it here as well.

> echo $SHELL
> dvc run -f test.dvc 'echo $SHELL'
> ls -la $SHELL
> file $SHELL
/bin/zsh
'test.dvc' already exists. Do you wish to run the command and overwrite it? [y/n] y
Running command:
    echo $SHELL
/bin/zsh
Saving information to 'test.dvc'.

To track the changes with git, run:

    git add test.dvc
-rwxr-xr-x 1 root wheel 610240 May  4 09:05 /bin/zsh
/bin/zsh: Mach-O 64-bit executable x86_64
> cat test.dvc
cmd: echo $SHELL
md5: ee3b44e50705d557b7aa3eef74821f74

I wish I could help out more, but my knowledge of Python environments and DVC internals is very limited. However, let me know if I can help you out with further information and I'm happy to provide it.

For the record: I am able to reproduce https://github.com/iterative/dvc/issues/2506#issue-494639954 even on the linux machine.

In my case which dvc shows pyenv shim, which has something like:

exec "/home/efiop/.pyenv/libexec/pyenv" exec "$program" "$@"

in it, which is the thing that adds some stuff on top of the base env, as we can see:

➜  dvc-test git:(755) βœ— /home/efiop/.pyenv/libexec/pyenv exec --help
Usage: pyenv exec <command> [arg1 arg2...]

Runs an executable by first preparing PATH so that the selected Python
version's `bin' directory is at the front.

For example, if the currently selected Python version is 2.7.6:
  pyenv exec pip install -rrequirements.txt

is equivalent to:
  PATH="$PYENV_ROOT/versions/2.7.6/bin:$PATH" pip install -rrequirements.txt

It would be nice of pyenv would've left something like PATH_ORIG env var, so that we could use it later. This would be similar to how pyinstaller leaves VAR_ORIG if it changes it, e.g. LD_LIBRARY_PATH. Looking for possible and viable automatic workarounds. Might have to suggest this to pyenv later though, to make it straightforward for everyone.

Interesting detail: our good friend @AlJohri has run into it before even using dvc: https://github.com/pyenv/pyenv/issues/985 πŸ™‚

@gwerbin @benjaminvdb Guys, I've pushed a fix to my branch, which does work for me. Could you please install a dev version and tell me if it works for you too? I.e. you'll need to:

  1. uninstall dvc (e.g. pip uninstall -y dvc)
  2. make sure that which dvc doesn't show anything
  3. install dev version with pip install git+https://github.com/efiop/dvc@2506
  4. try running echo $PATH and dvc run -f tmp.dvc 'echo $PATH' to make sure that they print the same values. And check if previously broken commands work for you now.

For the record: our workaround works, but the proper solution for this would be to make pyenv backup the original PATH value in something like https://github.com/pyenv/pyenv/pull/987/files#diff-414cbe2ca9ee5fbfdc48e82392f82db7R89 , so we could use it to restore original PATH before spawning commands from dvc. It doesn't seem like pyenv could restore PATH by itself, because of the cpython issue that they were solving with this in the first place. That being said, there are a few PRs in pyenv that are trying to do that, so we'll see how it goes.

For the record: https://github.com/pyenv/pyenv/pull/987#issuecomment-533944218 , looks like https://github.com/pyenv/pyenv/pull/1169/files is the current candidate.

I'm afraid the current solution doesn't work for me.

  • I've uninstalled dvc (which dvc has no results)
  • Then installed the latest dev release from GitHub: pip install git+https://github.com/iterative/dvc.git (python -c "import dvc; print(dvc.__version__)" returns 0.59.2+7409fa)
  • Still getting module not found for a module installed in the virtualenv
  • which python returns my virtualenv Python interpreter, while dvc run -o test 'which python > test' shows the global Python interpreter in the test file.

Let's hope one of the proposed fixes in pyenv will fix the problem for dvc too!

@benjaminvdb I see. Your issue seems to be unrelated to pyenv, but rather to virtualenv. Reopening, will take a closer look into that.

Let's see if the current solution works for @gwerbin , which was using pyenv.

Ok, so we've had a great discussion with @benjaminvdb in private messages and I was able to reproduce on my mac with homebrew quite easily with

virtualenv .venv
source .venv/bin/activate
which python
dvc run -f tmp.dvc 'which python'

dvc was installed as a binary from brew cask, so there is something interesting going on.
At the same time, those same commands on linux are working as expected, so I suspect some brew-related magic here. Investigating...

Ok, it might that I was tripping, but I am no longer able to reproduce. :slightly_frowning_face:

So we've had an interactive debugging session with @benjaminvdb today and found that he had ~/.zshenv file that was modifying PATH. As it turned out [1]

.zshenv' is sourced on all invocations of the shell, unless the -f option is set. It should contain commands to set the command search path, plus other important environment variables..zshenv' should not contain commands that produce output or assume the shell is attached to a tty.

so it was modifying the PATH when dvc was spawning a new process. Moving those lines from zshenv to zshrc fixed the problem, but it would still be nice for us to protect against such things in the future. To do that we could consider using -f option for zsh and an equivalent option for bash, to make them not load such files. Will take a look.

[1] http://zsh.sourceforge.net/Intro/intro_3.html

For the record, pyenv issue was fixed in https://github.com/pyenv/pyenv/pull/1169 (PATH is still modified, but at least it is modified in a way that doesn't cause any real problems so far), but since pyenv has a quite long delivery cycle to distros (e.g. through system's package managers), we will need to live with our own workaround for quite a while.

Great work! 🎊 Thanks everybody for taking the time!

For the record, discussing _ORIG approach here https://github.com/rbenv/rbenv/issues/1190 .

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jorgeorpinel picture jorgeorpinel  Β·  3Comments

nik123 picture nik123  Β·  3Comments

dnabanita7 picture dnabanita7  Β·  3Comments

gregfriedland picture gregfriedland  Β·  3Comments

siddygups picture siddygups  Β·  3Comments