Steps to reproduce:
pyenv install miniconda3-latestpyenv shell miniconda3-latestconda install dvc==0.59.2conda create -n testenv python=3.7mkdir dvc-test && cd dvc-testdvc init && dvc run -f tmp.dvc 'echo $PATH'Expected output:
/bin/bin/bin/binActual output:
/bin/bin/bin/bin/bin/binSomehow part of the PATH from wherever DVC is installed is getting prepended to PATH before dvc run-ing something. This potentially breaks code in the project.
Note that installing and running DVC with Pipx instead of Conda does not result in the same problem. The adverse interaction appears to be Conda-specific (although there are many combinations of environments I haven't yet tried). This particular Pipx installation is not being managed by Conda.
Discord conversation: https://discordapp.com/channels/485586884165107732/485596304961962003/623513705145040919
It's happening to other users as well (discussion with Benjamin on Discord). Symptoms are very similar - zsh (a regular one), PATH is modified. Not conda, virtualenv is being used. OS - Mac.
@shcheklein in their case, is DVC installed in the "parent" Python environment, or in a separate virtualenv? The problem might lie in something that has to do with the parent/child relationship.
Also, did they use Homebrew to install Python? Brew is another common factor here, and more likely to cause problems than Zsh, since Brew does its own layer of symlinking.
My admittedly convoluted setup:
Linuxbrew
ββ Pyenv
βΒ ββ Conda <- PATH is broken when DVC is installed here
β ββ Active conda environment <- PATH is OK when DVC is installed here
ββ Python
ββ Pipx-managed Virtualenv <- PATH is OK when DVC is installed here
@gwerbin Thanks! I've asked Benjamin to take a look and share more details.
@shcheklein in their case, is DVC installed in the "parent" Python environment, or in a separate virtualenv? The problem might lie in something that has to do with the parent/child relationship.
I have tried two setups, and both fail the sense that:
ImportError: No module named pandas is returned.dvc run -o test 'which python > test' outputs /usr/local/bin/python in the test file, where it should point to python in the virtualenv.Setup 1
Homebrew ββ DVC (`/usr/local/bin/dvc`) ββ Virtualenv + Python (`/usr/local/bin/{python,virtualenv}`) ββActive virtualenv environment ββ Pandas
Setup 2
Homebrew ββ Virtualenv + Python (`/usr/local/bin/{python,virtualenv}`) ββActive virtualenv environment ββ DVC ββ Pandas
Also, did they use Homebrew to install Python? Brew is another common factor here, and more likely to cause problems than Zsh, since Brew does its own layer of symlinking.
Yes, Python was installed by Homebrew. (FYI: the Python interpreter that comes with the latest version of macOS (Mojave, version 10.14.6) is 2.7.10 and is 4.5 years old. I figure most people using Python on macOS will have shadowed this outdated version with a more recent one.)
@shcheklein and @efiop asked me to share the output of a few commands on the Discord channels and perhaps it helps if I share it here as well.
> echo $SHELL
> dvc run -f test.dvc 'echo $SHELL'
> ls -la $SHELL
> file $SHELL
/bin/zsh
'test.dvc' already exists. Do you wish to run the command and overwrite it? [y/n] y
Running command:
echo $SHELL
/bin/zsh
Saving information to 'test.dvc'.
To track the changes with git, run:
git add test.dvc
-rwxr-xr-x 1 root wheel 610240 May 4 09:05 /bin/zsh
/bin/zsh: Mach-O 64-bit executable x86_64
> cat test.dvc
cmd: echo $SHELL
md5: ee3b44e50705d557b7aa3eef74821f74
I wish I could help out more, but my knowledge of Python environments and DVC internals is very limited. However, let me know if I can help you out with further information and I'm happy to provide it.
For the record: I am able to reproduce https://github.com/iterative/dvc/issues/2506#issue-494639954 even on the linux machine.
In my case which dvc shows pyenv shim, which has something like:
exec "/home/efiop/.pyenv/libexec/pyenv" exec "$program" "$@"
in it, which is the thing that adds some stuff on top of the base env, as we can see:
β dvc-test git:(755) β /home/efiop/.pyenv/libexec/pyenv exec --help
Usage: pyenv exec <command> [arg1 arg2...]
Runs an executable by first preparing PATH so that the selected Python
version's `bin' directory is at the front.
For example, if the currently selected Python version is 2.7.6:
pyenv exec pip install -rrequirements.txt
is equivalent to:
PATH="$PYENV_ROOT/versions/2.7.6/bin:$PATH" pip install -rrequirements.txt
It would be nice of pyenv would've left something like PATH_ORIG env var, so that we could use it later. This would be similar to how pyinstaller leaves VAR_ORIG if it changes it, e.g. LD_LIBRARY_PATH. Looking for possible and viable automatic workarounds. Might have to suggest this to pyenv later though, to make it straightforward for everyone.
Interesting detail: our good friend @AlJohri has run into it before even using dvc: https://github.com/pyenv/pyenv/issues/985 π
@gwerbin @benjaminvdb Guys, I've pushed a fix to my branch, which does work for me. Could you please install a dev version and tell me if it works for you too? I.e. you'll need to:
pip uninstall -y dvc)which dvc doesn't show anythingpip install git+https://github.com/efiop/dvc@2506echo $PATH and dvc run -f tmp.dvc 'echo $PATH' to make sure that they print the same values. And check if previously broken commands work for you now.For the record: our workaround works, but the proper solution for this would be to make pyenv backup the original PATH value in something like https://github.com/pyenv/pyenv/pull/987/files#diff-414cbe2ca9ee5fbfdc48e82392f82db7R89 , so we could use it to restore original PATH before spawning commands from dvc. It doesn't seem like pyenv could restore PATH by itself, because of the cpython issue that they were solving with this in the first place. That being said, there are a few PRs in pyenv that are trying to do that, so we'll see how it goes.
For the record: https://github.com/pyenv/pyenv/pull/987#issuecomment-533944218 , looks like https://github.com/pyenv/pyenv/pull/1169/files is the current candidate.
I'm afraid the current solution doesn't work for me.
which dvc has no results)python -c "import dvc; print(dvc.__version__)" returns 0.59.2+7409fa)module not found for a module installed in the virtualenvwhich python returns my virtualenv Python interpreter, while dvc run -o test 'which python > test' shows the global Python interpreter in the test file.Let's hope one of the proposed fixes in pyenv will fix the problem for dvc too!
@benjaminvdb I see. Your issue seems to be unrelated to pyenv, but rather to virtualenv. Reopening, will take a closer look into that.
Let's see if the current solution works for @gwerbin , which was using pyenv.
Ok, so we've had a great discussion with @benjaminvdb in private messages and I was able to reproduce on my mac with homebrew quite easily with
virtualenv .venv
source .venv/bin/activate
which python
dvc run -f tmp.dvc 'which python'
dvc was installed as a binary from brew cask, so there is something interesting going on.
At the same time, those same commands on linux are working as expected, so I suspect some brew-related magic here. Investigating...
Ok, it might that I was tripping, but I am no longer able to reproduce. :slightly_frowning_face:
So we've had an interactive debugging session with @benjaminvdb today and found that he had ~/.zshenv file that was modifying PATH. As it turned out [1]
.zshenv' is sourced on all invocations of the shell, unless the -f option is set. It should contain commands to set the command search path, plus other important environment variables..zshenv' should not contain commands that produce output or assume the shell is attached to a tty.
so it was modifying the PATH when dvc was spawning a new process. Moving those lines from zshenv to zshrc fixed the problem, but it would still be nice for us to protect against such things in the future. To do that we could consider using -f option for zsh and an equivalent option for bash, to make them not load such files. Will take a look.
For the record, pyenv issue was fixed in https://github.com/pyenv/pyenv/pull/1169 (PATH is still modified, but at least it is modified in a way that doesn't cause any real problems so far), but since pyenv has a quite long delivery cycle to distros (e.g. through system's package managers), we will need to live with our own workaround for quite a while.
Great work! π Thanks everybody for taking the time!
For the record, discussing _ORIG approach here https://github.com/rbenv/rbenv/issues/1190 .
Most helpful comment
Great work! π Thanks everybody for taking the time!