conda "update" and "install" commands can leave an environment with inconsistent dependencies

Created on 19 Jan 2016  路  48Comments  路  Source: conda/conda

There have been a number of issues reporting various kinds of behavior which I think all has the same root cause. The dependency resolution when running "conda install" and "conda update" does not appear to take the other packages in the environment into question. This can result quite easily in an environment being updated to a state where the package dependencies are not actually met.

For example, let's suppose I create an environment with pandas and h5py, but with NumPy 1.9 initially (let's imagine NumPy 1.10 hasn't been released yet)

Rover:~ stan$ conda create -n dep_test python=3.4 h5py pandas numpy=1.9
Fetching package metadata: ....
Solving package specifications: ...................
Package plan for installation in environment /Users/stan/anaconda/envs/dep_test:

The following NEW packages will be INSTALLED:

    h5py:            2.5.0-np19py34_3
    hdf5:            1.8.15.1-2
    numpy:           1.9.3-py34_0
    openssl:         1.0.2e-0
    pandas:          0.16.2-np19py34_0
    pip:             7.1.2-py34_0
    python:          3.4.4-0
    python-dateutil: 2.4.2-py34_0
    pytz:            2015.7-py34_0
    readline:        6.2-2
    setuptools:      19.2-py34_0
    six:             1.10.0-py34_0
    sqlite:          3.9.2-0
    tk:              8.5.18-0
    wheel:           0.26.0-py34_1
    xz:              5.0.5-0
    zlib:            1.2.8-0

Proceed ([y]/n)? y

Linking packages ...
[      COMPLETE      ]|###################################################| 100%
#
# To activate this environment, use:
# $ source activate dep_test
#
# To deactivate this environment, use:
# $ source deactivate
#

Everything is fine and consistent.

Now, some time in the future, let's suppose I try to upgrade pandas:

Rover:~ stan$ conda update -n dep_test pandas
Fetching package metadata: ....
Solving package specifications: ..................
Package plan for installation in environment /Users/stan/anaconda/envs/dep_test:

The following packages will be UPDATED:

    numpy:  1.9.3-py34_0      --> 1.10.2-py34_0
    pandas: 0.16.2-np19py34_0 --> 0.17.1-np110py34_0

Proceed ([y]/n)? y

Unlinking packages ...
[      COMPLETE      ]|###################################################| 100%
Linking packages ...
[      COMPLETE      ]|###################################################| 100%

Now I have an h5py package that requires NumPy 1.9 installed at the same time as a pandas package that requires NumPy 1.10:

Rover:~ stan$ conda list -n dep_test pandas
# packages in environment at /Users/stan/anaconda/envs/dep_test:
#
pandas                    0.17.1              np110py34_0
Rover:~ stan$ conda list -n dep_test h5py
# packages in environment at /Users/stan/anaconda/envs/dep_test:
#
h5py                      2.5.0                np19py34_3

I've seen this behavior in a bunch of different (not specifically NumPy) related situations, leading me to conclude that (for speed reasons?) the dependency solver is not considering _all_ of the already-installed packages when creating an install plan for an upgrade operation, only those in the dependency tree rooted in the packages explicitly listed on the command line.

(This issue probably duplicates other issues, but I haven't seen one that explains the problem generically.)

Most helpful comment

Yeah, I think we have the Three Laws of Conda Update:

  1. conda update will not make the environment inconsistent or, through inaction, allow an environment to become inconsistent.
  2. conda update will install the packages explicitly requested by the user on the command line, except when it conflicts with the First Law.
  3. conda update will preserve the package state of environment, except when it conflicts with the First or Second Law.

All 48 comments

Ping @mcg1969

Sure, I'll study this. My ego is too big to believe that the solver is producing wrong answers, but it is possible we're feeding it the wrong inputs :-)

OK, that was pretty easy to diagnose. Stan's analysis is correct. The Resolve.solve method is being fed the spec ['pandas', 'numpy', 'python 3.4*'], along with a list of the installed packages. The solver uses the installed list to extract the installed features, but it doesn't add those solvers to the specs.

The fix is, of course, to include the existing packages in the specs. And that's exactly what happens when we do conda update --all: ['xz', 'numpy', 'setuptools', 'openssl', 'pytz', 'h5py', 'hdf5', 'readline', 'zlib', 'wheel', 'python-dateutil', 'pandas', 'six', 'pip', 'sqlite', 'python', 'tk', 'python 3.4*']

So then we need to ask: what should the difference be between conda update pandas numpy and conda update --all? The quick fix would mean there is _no_ difference; _any_ package update would result in a conda update --all.

In my view, the ideal behavior is this:
--- First, we test to see if pandas and numpy can be updated to their latest versions _without updating any other packages_. If so, pull the trigger.
--- Second, we effectively do a conda update --all, but we change the solver weights to prefer packages closest to the installed versions.

Needless to say that's a bit more work. But it's doable without a full solver rewrite.

Yes, I think that makes sense. We want to update as few packages as possible, but not let the environment become inconsistent (unless --force is in effect).

Good news: the solver already has code to "prefer" no updates to packages that aren't in the spec.

However, currently this capability is disabled by the update_dependencies configuration file. It looks like the default value of this variable is True, which effectively means that any conda update command, properly implemented, should be conda update --all.

It seems like it would be best to make this value default to False.

(Let me be clear: just changing this flag now will not fix this issue. I'm just saying that when I submit a PR, we'll need to change this default in order to get what I think is the correct behavior.)

Yeah, I think we have the Three Laws of Conda Update:

  1. conda update will not make the environment inconsistent or, through inaction, allow an environment to become inconsistent.
  2. conda update will install the packages explicitly requested by the user on the command line, except when it conflicts with the First Law.
  3. conda update will preserve the package state of environment, except when it conflicts with the First or Second Law.

Similar things should happen for remove:

位 conda create -n "conda_test" python=3.5 pandas
Fetching package metadata: ..............
Solving package specifications: ...................
Package plan for installation in environment C:\portabel\miniconda\envs\conda_test:

The following NEW packages will be INSTALLED:

    msvc_runtime:    1.0.1-vc14_0       defaults [vc14]
    numpy:           1.10.1-py35_0      defaults
    pandas:          0.17.1-np110py35_0 defaults
    pip:             7.1.2-py35_0       defaults
    python:          3.5.1-0            defaults
    python-dateutil: 2.4.2-py35_0       defaults
    pytz:            2015.7-py35_0      defaults
    setuptools:      19.2-py35_0        defaults
    six:             1.10.0-py35_0      defaults
    wheel:           0.26.0-py35_1      defaults

Proceed ([y]/n)? y

Linking packages ...
[      COMPLETE      ]|##################################################| 100%
#
# To activate this environment, use:
# > activate conda_test
#

位 activate conda_test
Deactivating environment "C:\portabel\miniconda"...
Activating environment "C:\portabel\miniconda\envs\conda_test"...

[conda_test] 位 conda remove numpy
Fetching package metadata: ..............

Package plan for package removal in environment C:\portabel\miniconda\envs\conda_test:

The following packages will be REMOVED:

    numpy: 1.10.1-py35_0 defaults

Proceed ([y]/n)? y

Unlinking packages ...
[      COMPLETE      ]|##################################################| 100%

[conda_test] 位 python -c "import pandas"
Traceback (most recent call last):
  File "C:\portabel\miniconda\envs\conda_test\lib\site-packages\pandas\__init__.py", line 7, in <module>
    from pandas import hashtable, tslib, lib
  File "pandas\src\numpy.pxd", line 157, in init pandas.hashtable (pandas\hashtable.c:38262)
ImportError: No module named 'numpy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\portabel\miniconda\envs\conda_test\lib\site-packages\pandas\__init__.py", line 13, in <module>
    "extensions first.".format(module))
ImportError: C extension: No module named 'numpy' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace' to build the C extensions first.

Oh, yes, of course! Let me see if that is easily added here.

[Note that it would still be nice if there would be a force or "assume it is installed" switch/state, so that one can e.g. install pandas from source when doing pandas development... But still install seaborn, which depends on pandas.]

Hmm, I'd say that's non-trivial enough, and sufficiently removed from this one, that it needs its own PR.

@JanSchulz:

Note that it would still be nice if there would be a force or "assume it is installed" switch/state

For the use-case you listed, I think conda install --no-deps seaborn would still suffice, right? I suppose you might have to manually type the names of seaborn's other dependencies (such as matplotlib, cycler, etc.), but the --no-deps option gives you a method for installing seaborn without overwriting your development version of pandas.

My workflow is usually the reverse: install seaborn, remove matplotlib or pandas :-)

@seibert love the three laws. Should be part of the documentation probably!

@JanSchulz The conda behavior for a long time has been to happily uninstall anything requested. Would definitely be an enhancement to warn about installed dependencies of what you're about to uninstall, but that's definitely a different PR I think.

Something like "conda mark-as-installed pandas" would be nice, then you could do

conda install pandas # adds deps
conda remove pandas # removes only pandas
conda mark-as-installed pandas # adds a "virtual package"
conda install seaborn # uses the "virtual pandas"...
conda update --all # would upgrade everything apart from pandas...

Alternative:

conda remove --keep-as-virtual-package pandas 

Is this the same issue that prevents --no-update-dependencies from actually working for me in an install situation? As in if I try to install a package that has its dependencies apparently satisfied in the environment, I still get an update that looks unnecessary?

mpld3 0.2 py27_0
----------------
file name   : mpld3-0.2-py27_0.tar.bz2
name        : mpld3
version     : 0.2
build number: 0
build string: py27_0
channel     : defaults
size        : 112 KB
date        : 2016-01-11
license     : BSD License
md5         : 005253fc58f5bf8502584f496109a9fd
installed environments:
dependencies:
    jinja2
    matplotlib
    python 2.7*

Dependencies look to be in my environment:

bash-4.1$ conda list | egrep '(^jinja2|^matplotlib|^python )'
jinja2                    2.7.3                    py27_1    defaults
matplotlib                1.4.3                np19py27_2    defaults
python                    2.7.9                         3    defaults

But an install appears to want to install more recent versions even with --no-update-dependencies:

conda install --no-update-dependencies mpld3
The following NEW packages will be INSTALLED:

    cycler:          0.9.0-py27_0      defaults
    mpld3:           0.2-py27_0        defaults
    pycairo:         1.10.0-py27_0     defaults

The following packages will be UPDATED:

    jinja2:          2.7.3-py27_1     defaults --> 2.8-py27_0        defaults
    matplotlib:      1.4.3-np19py27_2 defaults --> 1.5.1-np110py27_0 defaults
    numpy:           1.9.2-py27_2     defaults --> 1.10.2-py27_0     defaults
    openssl:         1.0.1k-1         defaults --> 1.0.2e-0          defaults
    pip:             6.0.8-py27_0     defaults --> 8.0.1-py27_0      defaults
    pyqt:            4.11.3-py27_1    defaults --> 4.11.4-py27_1     defaults
    python:          2.7.9-3          defaults --> 2.7.11-0          defaults
    python-dateutil: 2.4.1-py27_0     defaults --> 2.4.2-py27_0      defaults
    pytz:            2015.2-py27_0    defaults --> 2015.7-py27_0     defaults
    pyyaml:          3.11-py27_0      defaults --> 3.11-py27_1       defaults
    qt:              4.8.6-3          defaults --> 4.8.7-1           defaults
    requests:        2.6.0-py27_0     defaults --> 2.9.1-py27_0      defaults
    setuptools:      18.8.1-py27_0    defaults --> 19.4-py27_0       defaults
    sip:             4.16.5-py27_0    defaults --> 4.16.9-py27_0     defaults
    six:             1.9.0-py27_0     defaults --> 1.10.0-py27_0     defaults
    sqlite:          3.8.4.1-1        defaults --> 3.9.2-0           defaults
    yaml:            0.1.4-0          defaults --> 0.1.6-0           defaults

However, if I include my full package list in the install command by also using the file I used to install them, I just get the package I want

bash-4.1$ conda install --no-update-dependencies --file pkgs.conda mpld3
Fetching package metadata: ....
Solving package specifications: ..........................................................................................................................
Package plan for installation in environment /data/fido/mconda:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mpld3-0.2                  |           py27_0         112 KB  defaults

The following NEW packages will be INSTALLED:

    mpld3: 0.2-py27_0 defaults

Or is this related to something like https://github.com/conda/conda/issues/1878 and maybe my whole conda environment is broken because I installed it using the --file option and I don't know if my dependencies were every appropriately checked?

--no-update-dependencies is not a substitute for --force. If the package you are trying to install has version specifications that demand a later version of a package, it will grab it, even if --no-update-dependencies is set. The purpose of this flag is to prevent conda from _trying_ to update the dependencies when you haven't asked, and it is not otherwise necessary to do so.

Is there more info that I'm not seeing that shows the package (in this case, mpld3) has version specifications that demand a later version of a package? I thought the list from info was complete, and in this case, the only version required was python 2.7 which was already satisfied:

dependencies:
    jinja2
    matplotlib
    python 2.7*
bash-4.1$ conda list | egrep '(^jinja2|^matplotlib|^python )'
jinja2                    2.7.3                    py27_1    defaults
matplotlib                1.4.3                np19py27_2    defaults
python                    2.7.9                         3    defaults

I can bounce this out to the google group if I'm just missing something, but it looked to me like my current environment was not passed to the solver, which I thought was the issue here.

It is true that the current environment is not pulled in. But all dependencies are---and all of their dependencies, too, all the way down. I am working on a larger fix to the solver that will do a better job of minimizing the disturbance to the environment when a single package is installed, but the priority is not to allow the environment to be broken.

Thanks Michael. I'm still not sure I understand though. My point in including this use case was that when you had the comment above "The fix is, of course, to include the existing packages in the specs." I think I completely agreed, but it wasn't clear to me that you were also implementing the fix for the conda install case as well as the conda update case.

I think such a fix would change the --no-update-dependencies behavior from "Do not unnecessarily update dependencies in the package list I'm submitting to this command" to the more expected/intended "Do not unnecessarily update dependencies that are already installed in my working environment so this can be a minimal install". Right?

I think there's something getting lost in communication, probably in both directions. Bear with me!

The problem first cited by ssiebert above affects _both_ conda install and conda update. Neither command should be allowed to result in a broken environment, even if you specify --no-update-dependencies. Unfortunately, sometimes that is exactly what happens.

However, there is still a difference between --no-update-dependencies and --update-dependencies. In the former case, a _minimal_ upgrade set is sought: the fewest possible version bumps to achieve a solution. In the latter case, the dependencies (and their sub-dependencies, etc.) are upgraded _as much as possible_.

It's possible that the current fix isn't perfectly achieving this goal. I'm working on a larger solver fix that should do so. If you can offer up a reproducible case that does more upgrading than it should, please post it. It's got to be something I can test myself in an isolated conda environment, but I'm happy to do it.

I think there's something getting lost in communication, probably in both directions. Bear with me!

No no, bear with me! Here's another use case similar to my mpld3 case. Here I've installed a vanilla python 2.7.9 environment. The create worked fine and I believe the environment has satisfied dependencies to start. Then, I've checked the dependencies for rope which should have a build that will work with 2.7.9. If I do a conda install --no-update-dependencies rope , it finds the right version of rope to work with my 2.7 environment, but then tries to do an unnecessary update from 2.7.9 to python 2.7.11. Does this make sense as a reproducible case? This happened to be with conda 3.18.5.

conda create -n dep_test python=2.7.9 
Fetching package metadata: ....
Solving package specifications: .........
Package plan for installation in environment /data/fido/ska/arch/x86_64-linux_CentOS-5/envs/dep_test:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    sqlite-3.9.2               |                0         3.9 MB  defaults
    setuptools-19.4            |           py27_0         365 KB  defaults
    pip-8.0.1                  |           py27_0         1.5 MB  defaults
    ------------------------------------------------------------
                                           Total:         5.8 MB

The following NEW packages will be INSTALLED:

    openssl:    1.0.1k-1      defaults
    pip:        8.0.1-py27_0  defaults
    python:     2.7.9-3       defaults
    readline:   6.2-2         defaults
    setuptools: 19.4-py27_0   defaults
    sqlite:     3.9.2-0       defaults
    system:     5.8-2         defaults
    tk:         8.5.18-0      defaults
    wheel:      0.26.0-py27_1 defaults
    zlib:       1.2.8-0       defaults

Proceed ([y]/n)? y

...

# To activate this environment, use:
# $ source activate dep_test

bash-4.1$ source activate dep_test

(dep_test)bash-4.1$ conda info rope=0.9.4=py27_1
Fetching package metadata: ....

rope 0.9.4 py27_1
-----------------
file name   : rope-0.9.4-py27_1.tar.bz2
name        : rope
version     : 0.9.4
build number: 1
build string: py27_1
channel     : defaults
size        : 225 KB
date        : 2014-08-22
license     : GPL
license_family: GPL2
md5         : e6017d8b755b05462880c7d30cf581e1
installed environments:
dependencies:
    python 2.7*

> source activate dep_test
discarding /data/fido/ska/arch/x86_64-linux_CentOS-5/bin from PATH
prepending /data/fido/ska/arch/x86_64-linux_CentOS-5/envs/dep_test/bin to PATH


(dep_test)bash-4.1$ conda install --no-update-dependencies rope
Fetching package metadata: ....
Solving package specifications: .
Package plan for installation in environment /data/fido/ska/arch/x86_64-linux_CentOS-5/envs/dep_test:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2e             |                0         3.2 MB  defaults
    python-2.7.11              |                0        12.0 MB  defaults
    ------------------------------------------------------------
                                           Total:        15.2 MB

The following NEW packages will be INSTALLED:

    rope:    0.9.4-py27_1 defaults

The following packages will be UPDATED:

    openssl: 1.0.1k-1 defaults --> 1.0.2e-0     defaults
    python:  2.7.9-3  defaults --> 2.7.11-0     defaults


Ah, I was able to reproduce that. Indeed, we can confirm that the upgrade is not necessary by attempting conda install rope=0.9.4 python=2.7.9. So it looks like --no-update-depdencies is not working properly with the current version of the code.

I think the reason for this is that python is treated specially. It's adding python=2.7* to your specifications. There are some technical reasons why this is the case, that I don't fully understand yet (which means I'm not going to remove it :-)). The code that does this is here. Perhaps @ilanschnell can comment as to why this is necessary.

So regardless of the value of --no-update-dependencies it is going to attempt to update python. I agree that is not desirable behavior, but It can't be fixed in this PR.

The good news is that my larger solver fix will take care of this. That's because it will give the ability to specify, on a package by package basis, whether to "upgrade fully" or "update just enough." Then the linked routine above can be modified to take advantage of this capability.

Oh, I know why this was originally added: the originating issue is this one. The idea is that conda should not update Python 2 to Python 3 unless specifically requested. It looks like fix actually goes further: it will not update, say, Python 2.6 to Python 2.7 without specifically requesting it.

This is of course a very reasonable thing to do, and given the limitations of the current solver architecture that function is necessary. With my larger solver fix, I'm not convinced it will be necessary any longer. But at the very least, it will be possible to tell conda to _leave Python alone_ unless it can't avoid it :-)

From my latest pull request. The one thing I'd like to see is some feedback in the second, --no-update-deps case. If someone knows there is a newer version of pandas, it would be important to tell them why we didn't find it. But this isn't quite as easy as it might first appear---probably requiring the full hint machinery.

0203-mgrant:conda mgrant$ conda update -n dep_test pandas --update-deps
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ..........
Solving package specifications: ..............

Package plan for installation in environment /Users/mgrant/miniconda2/envs/dep_test:

The following packages will be UPDATED:

    h5py:   2.5.0-np19py34_3  --> 2.5.0-np110py34_4 
    numpy:  1.9.3-py34_0      --> 1.10.2-py34_0     
    pandas: 0.16.2-np19py34_0 --> 0.17.1-np110py34_0

Proceed ([y]/n)? n

0203-mgrant:conda mgrant$ conda update -n dep_test pandas --no-update-deps
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ..........
Solving package specifications: ............

# All requested packages already installed.
# packages in environment at /Users/mgrant/miniconda2/envs/dep_test:
#
pandas                    0.16.2               np19py34_0  

Everyone, this is a big thread, and I don't have the time to read the whole thing, but this is territory we've been over before. Below are some references.

The challenge is that it is very easy to get into a situation where there are no degrees of freedom. There are many ways this can happen, and once it does if conda insists on keeping all packages with satisfied dependencies then there is nothing to do but to start uninstalling to break the log jam.

Here are some other issues that touch on this:

https://github.com/conda/conda/issues/337 - conda doesn't just try to satisfy deps, it also tries to get the latest version of all deps that satisfy reqs, even if that means updating an otherwise perfectly functional set of pkgs.

https://github.com/conda/conda/issues/1696 - no distinction between "must have" and "nice to have" packages, so dep list can grow to be longer than is necessary

https://github.com/conda/conda/issues/454 - check satisfiability of all env pkg deps at install time

https://github.com/conda/conda/issues/1268 - check satisfiability of all env pkg deps after install, and warn.

My preference would be to have a configurable strict setting, also available on the CLI, that would ensure that all packages at all times have their deps satisfied. But if it is not set, then the unit of atomic pkg consistency is a single command. Previous commands are not considered.

To say a few more words on my earlier comment: as surprising as it sounds it is, from my 3 years of experience with conda, usually totally OK to upgrade packages in a way that on paper claim to break other package dependencies. The main problems:

  1. This is not what people expect, so it violates PLS. Even Stan, at Continuum for years now, apparently was not aware of this. In fact, in some ways it validates my comment: presumably Stan has not been bitten by this in his years of use of conda, until this point, so most of his dependency-breaking conda updates did not in fact stop his code from running. I try to educate people to this surprising fact whenever possible, but still, clearly it is something we need to address (https://groups.google.com/a/continuum.io/d/msg/anaconda/BK4nkmSieoY/9pKRZ4_RDAAJ)
  2. When it does mean that the conda operation will break the code you want to run in that conda environment then you have no idea until code that did work stops working and that is just confusing. For me, this clearly points to a need (as identified in some of the GH issues I reference above) to provide a warning that the operation will break other package dependencies (and ideally _how_ they are being broken so the user can make an informed decision as to whether or not they are prepared to live with this).
  3. Also as identified in some of the GH issues I reference above many package constraints are too rigid: either packages aren't strictly required, or the dependencies should really have followed http://semver.org/ (semantic versioning) and not had a hard requirement down to the patch level. And now in the world of Python 3.4 and Python 3.5 in many cases 3.4 >= python < 4.* (however you specify that in conda-dependency-speak) might be a little optimistic but probably totally acceptable, rather than the crazy situation we have now of creating py34 and py35 packages for everything even though they are identical.
  4. We need to understand that we have two quite distinct use cases/user profiles, and we need to be prepared to serve them both:

    • the interactive, exploratory data scientist who can (if suitably informed) make decisions about package upgrades and having a conda environment where not everything might run perfectly at any given time, but so long as the last few conda operations are still "valid" (deps still satisfied), those are the ones they care about most, and please don't hold them back with artificial dependency constraints. This person needs --relax-patch-deps and --nowarn-broken-deps; and

    • the computer program or sys admin who needs total reliability, minimal deltas on package changes (if it ain't broke, don't fix it), and knowledge of exactly which packages are installed at any given time. This person needs --minimal flag and --strict-deps (or whatever).

I agree with the assumption that there should be a way to disable dependencies, but this should only happen with long commandline switches and not happen on default.

I think there is also a third person, which needs to be satisfied: someone who has not a lot of ideas about requirements and such stuff and wants to have certain python packages to "just work". For such persons, having a requirement system which leads a system/env in a broken state, even for same edge cases, is a bad thing.

I think there are a few things which would make this problem a bit easier if these would be addressed:

  • Some requirements are not actually requirements for the way you use the packages. The problem here is, that a python package currently can't be broken into multiple conda packages (e.g. matplotlib into several packages with one backend each) -> I don't need a pyqt if I don't use that backend. -> This is https://github.com/conda/conda/issues/793
  • There is no automatic checks for (native) dependencies. E.g. debian has a whole battery of tools to make automatic specification of dependencies (at build time) possible. Implementing a default check for all dlls/so files that they don't miss any dependencies during the test phase of the build would make failures like https://github.com/conda/conda/issues/2028 impossible. Same for python packages (see next point).
  • Automatic build time specification: it seems possible for debian packages, so why not for conda packages? Debian builds a database for each exported symbol and when it was introduced, so that dependencies can be specified down to the the specific version which implements all symbols which are needed. One could build such a database for all symbols in python as well (e.g. build on https://github.com/bndr/pipreqs). If such a database would exist, then this means that he requirments get less specific and therefore easier to satisfy.

conda doesn't just try to satisfy deps, it also tries to get the latest version of all deps that satisfy reqs, even if that means updating an otherwise perfectly functional set of pkgs.

Oh, I can understand why this would be a bad thing, but it is also is a false choice There is a third option, which we have implemented in the update. Here's the behavior as we have implemented:

  • --update-deps, the default: identify the _minimum_ set of dependency changes to the current environment to allow the explicitly requested packages to be updated.
  • --no-update-deps: no currently installed packages outside of the install list are to be updated. This will, of course, sometimes limit the amount of updating that occurs.

What we are _not_ doing is attempting to retrieve the latest versions of all dependencies. The goal is minimum disruption of the environment to achieve the user's request, _but in a compliant manner_. If you want dependencies to be fully updated, you have two choices: 1) put them in the install list, and 2) conda update --all (ugh!)

no distinction between "must have" and "nice to have" packages, so dep list can grow to be longer than is necessary

I absolutely agree. This is entirely doable with the new solver architecture. In fact I've been advocating for keeping track of packages that were _explicitly requested_ versus ones that were installed solely because of dependency requirements. Then if a combination of updates happens to render a dependency unnecessary, it can be removed (with appropriate warning/confirmation).

check satisfiability of all env pkg deps at install time
check satisfiability of all env pkg deps after install, and warn.

These are good ideas. The current update does a better job of ensuring satisfiability of the current environment but I don't think it pulls in every package---just the ones affected by the current operation.

But if it is not set, then the unit of atomic pkg consistency is a single command. Previous commands are not considered.

If you are suggesting that we allow conda install or conda update to cause an unsatisfiable environment without the use of a --force command, then I reject this totally. If we need to do more adjustments to the current environment than the user was expecting, it will be evident by the longer-than-expected list of changes conda is making. The user is free to hit "N" and investigate further why this is happening. And we could certainly do a better job of offering diagnostics as to why those additional steps are necessary.

I understand the temptation to _claim_ a set of packages is "perfectly functional" even if it violates its own dependency requirements. And of course the claim might even be true. But _the developers of the packages themselves_ are not making that claim, and we should not silently override their judgement. If you want to keep a broken set of requirements in place, you should be enabled to do so. But you should be fully aware that is what is happening.

If you are suggesting that we allow conda install or conda update to cause an unsatisfiable environment without the use of a --force command

No, certainly not. I think you answered my question during Demo Friday when you said that attempts at consistency are limited to a current atomic conda command. My one clarifying question on this in the way of a scenario:

Universe of packages:

A.1 dep: X>=1, Y=1
B.1 dep: X=2, Y=1
B.2 dep: X=2, Y=2
X.[123] dep: None
Y.[123] dep: None

Initially empty system.

conda install A

-> will install A.1 (assuming it is the latest), and in turn this will result in X.3 and Y.1 being installed as dependencies (both installed for the first time, take the latest that satisfies the dependencies)

-> same behavior "today" as this proposed change.

conda install B

-> today, I think this will install B.2, since it is the latest, and its dependencies X.2 (a downgrade) and Y.2 (an upgrade). The user gets the latest version of B which might be (I'd say "probably is") what they wanted. If A.1 "accidentally" works with Y.2 (more recent than what it specifies as its actual dependencies), lucky you! But if not, A has silently been broken as a result of the installation of B (which in no way refers to A, and there is no reason to think a user would be aware that X and Y were deps of A).

-> with these changes, IIUC, this will install B.1 and downgrade X to X.2, since doing so satisfies the user's request (which didn't specify any particular version for B) and the downgrade of X from X.3 to X.2 continues to satisfy A.1's dependencies. Will the user be told that a newer B.2 version was skipped? Because if they aren't, they might be confused that the blog post they just read about the cool new features of B don't work for them when they conda install it. (maybe because they "naturally" expect this, or because they've been using conda for 3 years, and that is the behavior they've always experienced).

conda install Y

-> today and with the proposed changes the user would be told *"Package Y.1 is already installed". Interestingly if the change happens where we know about packages that were explicitly requested vs. those that were only installed as dependencies, I suppose this action would need to now "mark" Y as an "explicitly requested" package.

-> today you don't get any information that there is a Y.3 available, though often that is exactly what the user was wanting: the latest version of Y.

-> with the proposed changes, perhaps there would be scope to say something sensible like _"Z.1 installed, Z.4 latest available version that satisfies other packages' dependencies (upgrade possible), Z.6 latest but breaks dependencies"_

conda update Y

-> today, Y.3 gets installed and the user goes and tries it out and is happy that they have the latest Y. But what they didn't know is that A and B might now be broken, and certainly their stated dependencies are no longer satisfied.

-> with these changes, IIUC, conda will report _"Package Y.1 already installed"_, and same comment above about some additional hint that potentially could be provided: _"Y.1 installed, Y.1 latest available version that satisfies other packages' dependencies, Y.3 latest but breaks dependencies"_

conda install X==1

-> today, X.1 is installed. A's dependencies are satisfied (well, other than the conda update Y from the last command that moved it to Y.3, but you get the point), but Bs dependencies are broken.

-> with these changes the user is told _"Unsatisfiable package constraints: installing X will break dependencies of packages already installed [B]"_

conda install --force X==1

-> today, --force forces the installation of the specified packages, even if they are already installed, they are just re-installed. There is also --no-deps which installs a package (or set of explicitly specified package names, possibly with version specifiers) but none of its (their) dependencies. IMPORTANT: --no-deps is implied by --force

-> with the changes you're proposing --force will have different meaning I think, and I believe there are several levels at which one might want to --force:

  1. People will still want to force-install packages as a reinstall (same version). Unclear if in this new system that capability should or shouldn't trigger dependency re-installation.
  2. People will want to have a conda command line that "does stuff", and they just want that stuff to happen, triggering any actions related to dependencies, and not caring if it violates package dependencies of things not specified on the command line. Basically this is partly what "today's" conda does without any special flags.

-> as a footnote there is _also_ the concept of --maximal or --minimal which say _"update as [many/few] packages as possible in satisfying this particular operation"_, captured today in the --update-deps and --no-update-deps flags.

As to @ahmadia's comments about "should this be a conda 3.0 release?" having gone through this scenario above, I am sure that it should. Another piece in favor is that --update-deps is the current default. The proposed changes will have --no-update-deps as the default instead. I am confident that the new system that @mcg1969 is proposing looks far superior to what we have now, but it will be a big change in behavior from what we have now. Especially if command line options change and the defaults for existing command line options. That is a non-backwards compatible SemVer (http://semver.org) API change necessitating a major version number increment.

It is true that if a user performs a conda install B, it will fail to get B.2 because of the dependency limits, and there will be no notification that this has occurred. It is important to recognize, however, that while this may be new behavior _in this particular scenario_, it is something that _can and does happen now_ when installing a combination of packages. For example, consider

conda create -n test2 -c ipyrad ipyrad ipython

Once this _finally_ completes, it will present you with a nice recipe that includes IPython 4.0.0, and _not_ the latest version, IPython 4.0.3. And it will not tell you that it has been forced to install a less-than-latest version.

Yes, that is a great point, and I think the real canonical example of it is when people do conda update anaconda and get a notice that a stack of things are going to be downgraded (everything they have individually upgraded since the last public Anaconda release that is also in the Anaconda meta-package).

Or to tie it to your current example, if you do

conda create -n test2 A B

You will get A.1 and B.1, with no notification that B.2 is even available.

It sounds to me, therefore, that the scenario we are playing with is a _current_ issue that ought to be created and tracked. What is changing here are simply the specific conditions under which it is exercised.

Kale and I were talking about it this morning before Demo Friday. It just so happens that the _debug log_ will include a list of packages that have newer versions than the ones selected for install!

Thanks to @groutr for building conda packages for #2000.

conda install -c https://conda.anaconda.org/rgrout conda

Installers available 32/64-bit Windows & Linux, 64-bit Mac. Please try it out on your most challenging install & update recipes.

Would be good if this get's into conda soon, I'm running into this on a weekly basis and we have lots of less-skilled users of our package at obspy/obspy that run into the same problems and that have no clue what's going on (and not the Python experience to find out).

Edit: Btw, @groutr, @mcg1969 can confirm that the conda 3.19.1.51.g2ba04a6 on your channel works (Linux64bit Py27) and keeps the package list consistent (numpy-wise) when installing a new numpy dependent package (not updating numpy and leaving behind numpy-dependent packages as on the still stable conda version).

Great news about the beta, @megies, thanks. It turns out this has been quite a non-trivial challenge. We're working to get this beta exercised, and have found a couple of issues as a result. Please keep testing!

@mcg1969, one thing I noticed earlier: When I uninstall numpy, conda doesn't complain about all those other packages that are compiled against numpy (rendering them unusable, presumably).

Edit: I guess I'm spoiled by Debian package management. :cyclone:

Are you saying that's happening now with the latest beta? If so please file an issue; it shouldn't.

Are you saying that's happening now with the latest beta? If so please file an issue; it shouldn't.

Euhm, sorry. My conda got updated (to 3.19.3) from the one I manually installed from file from your channel earlier today (3.19.1.51.g2ba04a6-dev) without me noticing. Too much hacking around with envs today.

Actually.. just now noticing that what I tested earlier today is already old.. the current beta (osx-64/conda-3.19.3.202.gb4142f4-py35_0.tar.bz2 ?) is only a single binary file for OSX64 while your channel is flooded with binaries for all different ARCH and OS for the earlier one..

You need to actually add the conda channel to your config.

conda config --add channels conda

Otherwise, conda will update itself to the latest version it sees, which will be 3.19.3 on the next install.

Also, we just released an updated beta tonight. 3.19.3.214

OK, I've updated to bleeding edge conda version served through conda channel. Now getting an Exception when trying to check out what conda remove numpy is trying to do:

~$ conda remove numpy
Fetching package metadata: ........
Solving package specifications: ...........
An unexpected error has occurred, please consider sending the
following traceback to the conda GitHub issue tracker at:

    https://github.com/conda/conda/issues

Include the output of the command 'conda info' in your report.


Traceback (most recent call last):
  File "/home/megies/anaconda/bin/conda", line 6, in <module>
    sys.exit(conda.cli.main())
  File "/home/megies/anaconda/lib/python2.7/site-packages/conda/cli/main.py", line 139, in main
    args_func(args, p)
  File "/home/megies/anaconda/lib/python2.7/site-packages/conda/cli/main.py", line 146, in args_func
    args.func(args, p)
  File "/home/megies/anaconda/lib/python2.7/site-packages/conda/cli/main_remove.py", line 157, in execute
    actions = plan.remove_actions(prefix, specs, index=index, pinned=args.pinned)
  File "/home/megies/anaconda/lib/python2.7/site-packages/conda/plan.py", line 475, in remove_actions
    if pinned and any(r.match(ms, dist) for ms in pinned_specs):
  File "/home/megies/anaconda/lib/python2.7/site-packages/conda/plan.py", line 475, in <genexpr>
    if pinned and any(r.match(ms, dist) for ms in pinned_specs):
  File "/home/megies/anaconda/lib/python2.7/site-packages/conda/resolve.py", line 511, in match
    return ms.match(self.index[fn])
AttributeError: 'str' object has no attribute 'match'
~$ conda info
Current conda install:

             platform : linux-64
        conda version : 3.19.3.dev214+g57e4dd8
  conda-build version : 1.19.0
       python version : 2.7.11.final.0
     requests version : 2.9.1
     root environment : /home/megies/anaconda  (writable)
  default environment : /home/megies/anaconda
     envs directories : /home/megies/anaconda/envs
        package cache : /home/megies/anaconda/pkgs
         channel URLs : https://conda.anaconda.org/conda/linux-64/
                        https://conda.anaconda.org/conda/noarch/
                        https://conda.anaconda.org/obspy/linux-64/
                        https://conda.anaconda.org/obspy/noarch/
                        https://repo.continuum.io/pkgs/free/linux-64/
                        https://repo.continuum.io/pkgs/free/noarch/
                        https://repo.continuum.io/pkgs/pro/linux-64/
                        https://repo.continuum.io/pkgs/pro/noarch/
          config file : /home/megies/.condarc
    is foreign system : False

Thanks for this report. I'll take a look ASAP.

Closing this issue as should have long ago been addressed with conda 4.x.

Hi everyone,

I don't know whether this is the right place to raise this, but it's close enough that I don't want to raise a duplicate. Let me know if I should open a new issue, or link somewhere else.

Like many conda users, I more or less interchangeably install from defaults and conda-forge as the mood strikes me. My default channel from condarc, confusingly, is conda-forge, but some environments are based on defaults (for the MKL mostly). Now, conda-update in these environments does weird things. My conda version was 4.3.11.

Here I am trying to update all, specifying defaults as the channel. As you can see, it seems to work, except NumPy gets updated to conda-forge, thereby breaking everything. I type y while making a mental note to reinstall defaults numpy.

 $ conda update --all -c defaults
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/nuneziglesiasj/anaconda/envs/36:

The following packages will be UPDATED:

    hdf5:    1.8.17-1                  --> 1.8.17-2                                 
    ipython: 5.3.0-py36_0  conda-forge --> 6.1.0-py36_0                             
    numpy:   1.12.1-py36_0             --> 1.12.1-py36_blas_openblas_200 conda-forge [blas_openblas]
    pillow:  4.1.1-py36_0              --> 4.2.0-py36_0                             
    s3fs:    0.1.0-py36_0              --> 0.1.1-py36_0                             

Proceed ([y]/n)? 

hdf5-1.8.17-2. 100% |############################################################| Time: 0:00:00  40.60 MB/s
pillow-4.2.0-p 100% |############################################################| Time: 0:00:00  53.43 MB/s
ipython-6.1.0- 100% |############################################################| Time: 0:00:00  23.39 MB/s
s3fs-0.1.1-py3 100% |############################################################| Time: 0:00:00  22.87 MB/s

Now to re-update numpy:

 $ conda update -c defaults numpy
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/nuneziglesiasj/anaconda/envs/36:

The following packages will be UPDATED:

    scikit-learn: 0.18.2-np112py36_0                        --> 0.18.2-np112py36_blas_openblas_200 conda-forge [blas_openblas]
    scipy:        0.19.1-np112py36_0                        --> 0.19.1-np112py36_blas_openblas_200 conda-forge [blas_openblas]

The following packages will be SUPERCEDED by a higher-priority channel:

    numpy:        1.12.1-py36_blas_openblas_200 conda-forge [blas_openblas] --> 1.12.1-py36_0                                 

Proceed ([y]/n)? n

Right... Ok scrap that, what happens if I try to specify all those packages manually?

 $ conda update -c defaults numpy scipy scikit-learn scikit-image pandas
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/nuneziglesiasj/anaconda/envs/36:

The following packages will be SUPERCEDED by a higher-priority channel:

    numpy: 1.12.1-py36_blas_openblas_200 conda-forge [blas_openblas] --> 1.12.1-py36_0

Proceed ([y]/n)? y

Whew!

My .condarc:

channels:
  - conda-forge
  - defaults

Needless to say, a crisis was averted only because I know just enough about these things to be suspicious. A user described by @janschulz here, really not very far removed from myself, would have been totally screwed:

someone who has not a lot of ideas about requirements and such stuff and wants to have certain python packages to "just work".

So it seems conda update is still not there with @seibert's Three Laws... What's the recommended way to deal with this...?

@jni I would just go ahead and create a new issue to not clobber this one even more. But please reference this issue at the beginning so the story is there.

Was this page helpful?
0 / 5 - 0 ratings