virtualenv 20: is the symlink hack really worth it?

Created on 11 Feb 2020  ยท  14Comments  ยท  Source: pypa/virtualenv

I did some timing and it seems like the trouble it causes is not really worth it -- at the very least I'd like an option which copies instead of symlinks

Here's some timing I did to try and guage the differences -- since there's no options I could find I toggled this line to if False to get my "copy" data: https://github.com/pypa/virtualenv/blob/8c2985c2946e767bb6f74a7e22f51add17b38987/src/virtualenv/seed/via_app_data/via_app_data.py#L92

with symlinks

my platform for this example is relatively low powered, a 2015 MBP

$ rm -rf vvv; time virtualenv vvv

real    0m0.128s
user    0m0.107s
sys 0m0.023s
$ rm -rf vvv; time virtualenv vvv

real    0m0.128s
user    0m0.118s
sys 0m0.012s
$ rm -rf vvv; time virtualenv vvv

real    0m0.123s
user    0m0.121s
sys 0m0.004s
$ rm -rf vvv; time virtualenv vvv

real    0m0.119s
user    0m0.117s
sys 0m0.004s
$ rm -rf vvv; time virtualenv vvv

real    0m0.127s
user    0m0.109s
sys 0m0.020s

disk usage:

$ du -hs vvv
128K    vvv

problems this can cause:

$ # copied to same path on other machine
$ ./vvv/bin/python -c 'import setuptools'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'setuptools'
$ ./vvv/bin/pip --help
Traceback (most recent call last):
  File "./vvv/bin/pip", line 6, in <module>
    from pip._internal.cli.main import main
ModuleNotFoundError: No module named 'pip'

with copies

$ rm -rf vvv; time virtualenv vvv

real    0m0.179s
user    0m0.155s
sys 0m0.050s
$ rm -rf vvv; time virtualenv vvv

real    0m0.185s
user    0m0.158s
sys 0m0.050s
$ rm -rf vvv; time virtualenv vvv

real    0m0.183s
user    0m0.160s
sys 0m0.048s
$ rm -rf vvv; time virtualenv vvv

real    0m0.172s
user    0m0.162s
sys 0m0.035s
$ rm -rf vvv; time virtualenv vvv

real    0m0.181s
user    0m0.142s
sys 0m0.065s
$ du -hs vvv
7.5M    vvv

trade off

so we're looking at ~60ms of time overhead -- which (imo) isn't that much -- the disk usage is another concern but we're still taking that usage one way or another

other considerations

hardlinks would be another consideration -- it would alleviate the problems I have with symlinks (caches, using virtualenv as a deployment mechanism, etc.) -- I'd have to do some implementation work to verify that case

question

All 14 comments

I'm experiencing the ModuleNotFoundError: No module named 'pip' error mentioned above in https://travis-ci.org/galaxyproject/galaxy/jobs/648435619 , hoping for a solution.

@nsoranzo your issue is separate from the topic of this discussion, please open a new issue for that. @asottile I'll address your point raised after there are no more bugfixes needed at a later time; but note you can use --copies to get the copy behaviour (for both python files, and the app-data part). That being said if the app-data folder is causing you issues and you don't care about performance you really should be using the pip seeder.

I do care about performance, the copies approach is ~180ms whereas the pip approach is >3s

I don't want copies of the python executable

also --copies does not do copies:

$ virtualenv vvv --copies
$ tree vvv
vvv
โ”œโ”€โ”€ bin
โ”‚ย ย  โ”œโ”€โ”€ activate
โ”‚ย ย  โ”œโ”€โ”€ activate.csh
โ”‚ย ย  โ”œโ”€โ”€ activate.fish
โ”‚ย ย  โ”œโ”€โ”€ activate.ps1
โ”‚ย ย  โ”œโ”€โ”€ activate_this.py
โ”‚ย ย  โ”œโ”€โ”€ activate.xsh
โ”‚ย ย  โ”œโ”€โ”€ easy_install
โ”‚ย ย  โ”œโ”€โ”€ easy_install3
โ”‚ย ย  โ”œโ”€โ”€ easy_install-3.6
โ”‚ย ย  โ”œโ”€โ”€ pip
โ”‚ย ย  โ”œโ”€โ”€ pip3
โ”‚ย ย  โ”œโ”€โ”€ pip-3.6
โ”‚ย ย  โ”œโ”€โ”€ python
โ”‚ย ย  โ”œโ”€โ”€ python3
โ”‚ย ย  โ”œโ”€โ”€ python3.6
โ”‚ย ย  โ”œโ”€โ”€ wheel
โ”‚ย ย  โ”œโ”€โ”€ wheel3
โ”‚ย ย  โ””โ”€โ”€ wheel-3.6
โ”œโ”€โ”€ lib
โ”‚ย ย  โ””โ”€โ”€ python3.6
โ”‚ย ย      โ””โ”€โ”€ site-packages
โ”‚ย ย          โ”œโ”€โ”€ easy_install.py -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/easy_install.py
โ”‚ย ย          โ”œโ”€โ”€ pip -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/pip-20.0.2-py2.py3-none-any/pip
โ”‚ย ย          โ”œโ”€โ”€ pip-20.0.2.dist-info -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/pip-20.0.2-py2.py3-none-any/pip-20.0.2.dist-info
โ”‚ย ย          โ”œโ”€โ”€ pip-20.0.2.dist-info.virtualenv -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/pip-20.0.2-py2.py3-none-any/pip-20.0.2.dist-info.virtualenv
โ”‚ย ย          โ”œโ”€โ”€ pkg_resources -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/pkg_resources
โ”‚ย ย          โ”œโ”€โ”€ setuptools -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/setuptools
โ”‚ย ย          โ”œโ”€โ”€ setuptools-45.2.0.dist-info -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/setuptools-45.2.0.dist-info
โ”‚ย ย          โ”œโ”€โ”€ setuptools-45.2.0.dist-info.virtualenv -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/setuptools-45.2.0-py3-none-any/setuptools-45.2.0.dist-info.virtualenv
โ”‚ย ย          โ”œโ”€โ”€ wheel -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/wheel-0.34.2-py2.py3-none-any/wheel
โ”‚ย ย          โ”œโ”€โ”€ wheel-0.34.2.dist-info -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/wheel-0.34.2-py2.py3-none-any/wheel-0.34.2.dist-info
โ”‚ย ย          โ””โ”€โ”€ wheel-0.34.2.dist-info.virtualenv -> /home/asottile/.local/share/virtualenv/seed-v1/3.6/image/SymlinkPipInstall/wheel-0.34.2-py2.py3-none-any/wheel-0.34.2.dist-info.virtualenv
โ””โ”€โ”€ pyvenv.cfg

11 directories, 23 files

Yeah, fixed that part of #1575, haven't released yet as first want to get in https://github.com/pypa/virtualenv/pull/1571 with it, that is failing at the moment.

I think the symlink approach is very much worth it; especially on Windows with some anti-virus active. That being said, should it be the default option? Maybe not.

I guess my thought is that 60ms is not worth a sacrifice of correctness for sane defaults (whereas 3s+ is of course unacceptable)

In addition to breaking caching of virtualenvs in CI, I've also found it breaks our deployment system at lyft (which produces venvs at a well known location, then tars them up to deploy them)

I don't really want to continue having to "virtualenv's defaults are broken use --XXX" as I've had to do for --no-download for so long (thanks for fixing that by the way! ๐Ÿ™)

I'll create a PR that adds a separate flag for controlling the app-data copy/symlink behaviour, and make it copy by default on all platforms. With a bit of good progress should be out in the next two hours together with some other fixes.

Just for reference on Windows with some more strict anti-virus (on non-SSD harddisks) the difference is more than 60ms; it's more in the realm of 10 seconds.

Maybe then we could make the default for seeder be pip on non-Windows and the default be the symlink thing on Windows?

I consider the app-data path via copy superior on all cases; the symlink one is the more dangerous one. The pip seeder is 3s+ on non UNIX, and even longer on Windows. This way the default will be 200ms on UNIX, but users can opt-in into the faster --symlink-app-data if they can ensure that the symlinks are not broken.

Yeah - I totally get it for the Windows users.

Another use case that just broke for us, FWIW - is we have the CI system create a few shared virtualenvs that go into /usr/local that other things use to get their hands on some tools that don't want to have their depends installed globally. Those shared venvs are installed by root, since they're going in a shared location. BUT - that means that the symlinks are to /root/.local which on some base OS's is chmod 770 - so the virtualenvs just became unusable. We're fixing that with --seeder=pip - but there's gonna be a bazillion corner cases like that for folks using virtualenv under nix and if the main performance win is non-nix, maybe let's keep the default new behavior there? Just talking out loud ...

As a developer on UNIX I very much prefer being done in 200ms; as over to 3 seconds. So we'll keep the app-data as the default seeder I believe. We'll work through the edge cases as they come up.

Hello, a fix for this issue has been released via virtualenv 20.0.2; see https://pypi.org/project/virtualenv/20.0.2/ (https://virtualenv.pypa.io/en/latest/changelog.html#v20-0-2-2020-02-11) . Please give a try and report back if your issue has not been addressed; if not, please comment here, and we'll reopen the ticket. We want to apologize for the inconvenience this has caused you and say thanks for having patience while we resolve the unexpected bugs with this new major release.

thanks

Was this page helpful?
0 / 5 - 0 ratings