dvc run messes up library paths

Created on 7 Jun 2018  路  8Comments  路  Source: iterative/dvc

I have a command that relies on cupy. When I run it with dvc run ..., it raises this error and fails:

/tmp/_MEIeW2y4B/libstdc++.so.6: version `GLIBCXX_3.4.20' not found

If I run it without dvc run, no error is raised. I think this is b/c dvc run uses p = subprocess.Popen(self.cmd, cwd=self.cwd, shell=True) and it's messing up the environment variables. Would passing in the environment variables solve this problem?

env = os.environ
proc = subprocess.Popen(args, env=env)
bug

All 8 comments

Hi @yukw777 !

Thank you for the analysis, great catch! We should definitely not clean the env before running the command. I will prepare a patch shortly.

Hm, looking into it closer, it seems like I am not able to reproduce. env seems to be preserved across dvc run. Could you please try running these commands and check if they output the same env:

$ printenv
$ dvc run -f printenv.dvc printenv

and also, just to make sure, could you please try these as well:

$ echo $0
$ dvc run -f sh.dvc 'echo $0'

You can then safely remove sh.dvc and printenv.dvc.

hmm env vars might be a red herring..

Here's the minimal repro steps:

# inside a git repo with dvc initialized
virtualenv venv -p python3
source venv/bin/activate

# now we're in virtualenv
pip install cupy
python -c "import cupy"  # this succeeds
dvc run 'python -c "import cupy"' . # this fails

Error message:

Using 'Dvcfile' as a stage file
Reproducing 'Dvcfile':
    python -c "import cupy"
Traceback (most recent call last):
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/__init__.py", line 11, in <module>
    from cupy import core  # NOQA
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
  File "cupy/core/core.pyx", line 1, in init cupy.core.core
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/cuda/__init__.py", line 4, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 12, in <module>
    from cupy.cuda import function
  File "cupy/cuda/memory.pxd", line 7, in init cupy.cuda.function
ImportError: /tmp/_MEIUoawzf/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/cuda/memory.cpython-36m-x86_64-linux-gnu.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/__init__.py", line 32, in <module>
    six.reraise(ImportError, ImportError(msg), exc_info[2])
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/__init__.py", line 11, in <module>
    from cupy import core  # NOQA
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
  File "cupy/core/core.pyx", line 1, in init cupy.core.core
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/cuda/__init__.py", line 4, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 12, in <module>
    from cupy.cuda import function
  File "cupy/cuda/memory.pxd", line 7, in init cupy.cuda.function
ImportError: CuPy is not correctly installed.

If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host.
Also, confirm that only one CuPy package is installed:
  $ pip freeze

If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with:
  $ pip install cupy --no-cache-dir -vvvv

Check the Installation Guide for details:
  https://docs-cupy.chainer.org/en/latest/install.html

original error: /tmp/_MEIUoawzf/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /awsnas/peter/test/venv/lib/python3.6/site-packages/cupy/cuda/memory.cpython-36m-x86_64-linux-gnu.so)
Failed to run command: Stage 'Dvcfile' cmd python -c "import cupy" failed

Actually the env vars are different:

$ printenv
...
LD_LIBRARY_PATH=/usr/local/cuda/lib64
...
$ dvc run -f printenv.dvc printenv
...
LD_LIBRARY_PATH=/tmp/_MEI4nThRR:/usr/local/cuda/lib64
LD_LIBRARY_PATH_ORIG=/usr/local/cuda/lib64
...

The shells are different too.

$ echo $0
-bash
$ dvc run -f sh.dvc 'echo $0'
Reproducing 'sh.dvc':
    echo $0
/bin/sh

I still think this is the culprit.

Yes, that was my suspicion, that you are not using a default shell and thus your env is different. That makes sense now. Specifying env explicitly should solve this issue. The fix is going to be released in 0.9.8, that is going to be released in a week or so.

Thanks,
Ruslan

Btw, as a workaround, could you make sure that your default shell matches the one you are using?
I.e. it looks like you are using bash, but for some reason default shell for your user is /bin/sh. Could you try running chsh -s $(which bash) $USER and then check that echo $0 from the previous example shows bash in both cases? That should solve your issue, while I am looking into trying to solve it on dvc side in the mean time.

EDIT: I'm wrong, /bin/sh is the default one for Popen(shell=True) on Unix. Looking into solving this...

mm the env vars are still different, so didn't work :/

Another interesting thing I found:

$ dvc run echo $0
Using 'Dvcfile' as a stage file
Reproducing 'Dvcfile':
        echo -bash
-bash
$ dvc run 'echo $0'
Using 'Dvcfile' as a stage file
Reproducing 'Dvcfile':
        echo $0
/bin/sh

It probably has to do with how a single string command is run vs an array of strings is run.

@yukw777 thank you for trying that out! It turns out I was wrong(see edit above). We should use executable arg for Popen to specify the same shell that user is using when running a command. Preparing a patch right now.

Another interesting thing I found:

Yep, that is caused by the single quotes around the command itself. So in the first case dvc will receive a command echo -bash because $0 is being evaluated by shell before actually passing it to dvc, but in the second one - echo $0, because shell passed the command to dvc as a constant string.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

analystanand picture analystanand  路  3Comments

robguinness picture robguinness  路  3Comments

mfrata picture mfrata  路  3Comments

mdscruggs picture mdscruggs  路  3Comments

shcheklein picture shcheklein  路  3Comments