(@JanSchulz brought this up in https://github.com/conda-forge/conda-forge.github.io/issues/16#issuecomment-182430891)
I Agreed with @JanSchulz that we should avoid as much as possible to add packages in conda-forge that are available in the default channel.
However, we already have a few redundant packages (pyproj, shapely, geos, and more to come soon). The reason for th1 redundancy is that those packages are partially broken in the default channel.
(And we could not find a proper channel of communication to send the recipe patch back to them.)
Maybe, when fixing a default channel package we should allow the package addition here as long as there is a plan to send that fix back to the default channel, and to remove the package from conda-forge once that happens.
Maybe register a new channel: "temporary-fixes"?
Maybe @mcg1969 has some idea how this could be handled?
I'm not sure it's worth a different channel.
But I wonder if we should give the package a different name, otherwise things can get pretty tangled up:
gdal-cf
or ????
(cf for conda forge...)
That means that anyone using it has to change the dependency, but are there going to be any packages outside of conda-forge that depend on a conda-forge special build???
But I wonder if we should give the package a different name
:-1: that can be confusing. We can maybe "sign" the packages with a build string like
build:
string: conda-forge
But IMO just having the package in a different channel should be enough to disambiguate.
The reason for a different channel is IMO that I suspect that at some point users will add conda-forge to their default channels and a different channel than conda-forge means that users can get the updated/fixed version with conda install -c whatever matplotlib but not with a simple conda update --all.
I'm not sure the channel disambiguates -- does conda prioritize default or your other channels?
But I wonder where all this goes -- with PyPi, it's up to each package maintainer to keep things up to date. with anaconda, it's up to third parties -- mainly continuum. so if they don't, then what?
I'm hoping continuum will adopt a more community model, where folks can easily provide PRs -- it seems it would only save them work. So we'll see.
In the meantime, conda-forge may become the community channel, and I"d say if you want the latest and greatest, then you add that channel -- and you'll get the new MPL, or whatever, if conda-forge provides a newer one than default.
We will probably want to clean things out as continuum catches up.
I'm still wondering about the naming though:
continuum builds package-1.2.1
a new version comes out, and folks want it -- but continuum is being slow on teh draw.
conda-forge provides package-1.3.1
all is good.
now continuum catches up and builds a package-1.3.1 -- now there are two, with the same version. And maybe they are even incompatible in some way. This could make a mess for our users.
If we go with Jan's approach, then this would be cleaner, but users would have to explicilty make a point of getting a newer version -- I think that would be awkward and often missed as an option.
debian has a similar problem when using backports, but could manage that with a special version suffix (~) which sorts after the version without the suffix:
https://www.debian.org/doc/debian-policy/footnotes.html#f37
One common use of ~ is for upstream pre-releases. For example, 1.0~beta1~svn1245 sorts earlier than 1.0~beta1, which sorts earlier than 1.0
As this probably requires changes in conda, I would vote for 1.3.0_real_1.3.1 which sorts before 1.3.1, but after 1.3.0.
Hah, it seems conda already implements such a scheme per default, so 1.3.1.cf is lower than 1.3.1
https://github.com/conda/conda/blob/2ba04a6b2617227de578f4af54ff11115f97ca5c/conda/version.py#L81
nice!, maybe we can use that, then.
-CHB
I'm not sure the channel disambiguates -- does conda prioritize default or your other channels?
Disambiguate? Yes! Solve the install/update problem? No. I even saw an e-mail from Travis Oliphant today discouraging the use of conda update --all because of how conda solves dependencies.
I'm hoping continuum will adopt a more community model, where folks can easily provide PRs -- it seems it would only save them work. So we'll see.
:+1:
discouraging the use of conda update --all because of how conda solves dependencies
This is probably https://github.com/conda/conda/issues/1967
Yes. The conda solver has recently been overhauled, and my experience with it is that the performance has been dramatically improved.
If two channels are configured to be used by conda, and both provide the same package name, version and build number, then conda chooses the package from the channel defined first in the config.
now there are two, with the same version. And maybe they are even incompatible in some way.
This is the real problem here. We have, in the past, fixed packages (on IOOS and SciTools) which Continuum package, often by releasing a newer version. The problem comes when Continuum update the version of the software they package, but don't actually fix the problem. This has happened on several occasions with packages such as Shapely and pyproj. From a user's perspective, they are just updating their software and it goes from a functional state to a non-functional state - not really ideal. Because of the lack of a repository of canonical recipe source, all we have been able to do is report a problem with the package, not actually fix it (i.e. in the form of a PR).
@jakirkham, @tacaswell and @stefanv have all expressed an opinion on the subject of this issue in the past. Do any of you have comments on when it should be the place of conda-forge to package software which is already being packaged by Continuum?
@JanSchulz thanks for pinging me. There's not a whole lot I can officially say yet, except that we recognize the need to support alternatives to Continuum's default channels. We're actively working on a particular community channel solution, but it is not the only way forward, and it _shouldn't_ be. We've been watching the Conda Forge project with enthusiasm. As we talk more we may be able to come up with some specific ways Continuum can help with it. But having an effort like this that Continuum _does not control_ is beneficial to the Python community at large, so I'm grateful you're working on it.
I see three different problems with the default channel:
All three of these issues are an inevitable consequence of Continuum's finite resources for building and supporting packages. We certainly acknowledge that this isn't going to satisfy people who regularly bump up against one of these three problems. Heck I bump up against all three of these problems myself.
My particular perspective, as many of you know who have been watching my conda fixes recently, is on the _dependency solver_. I've been spending time overhauling it, and it's certainly going to fix some of the issues like conda update --all being slow, conda remove potentially breaking installs, etc. I'm glad @pelson is confirming that my improvements are beginning to make a difference.
But honestly, the mathematics of the solver isn't really the issue here, at least not directly. What you are discussing in this thread is basically the challenge of _channel clashes_. That is: how should conda handle things when two or more channels release versions of the same package? At the moment, conda effectively "merges" the channels together, so that the packages interleave with each other purely based on version and build numbers. That's clearly not a workable solution. For one thing, build numbers don't have meaning across channels; so for instance, build 1 from channel A may actually be _newer_ than build 2 from channel B, and conda doesn't know that. This is something we need to decide on a fix for.
What we need, it seems to me, is to identify specific improvements to conda that would greatly improve the ability to use alternate, community-driven package channels. For instance:
1) A fix for conda that untangles packages/channel conflicts. For instance, we could say that the highest-priority channel is always preferred for a package, and any packages by the same name in lower-prioirity channels are ignored. But I could see a variety of other strategies, and perhaps conda should adopt several, choose on as a default, but make the others available by configuration.
2) An enhancement to conda that allows channel preferences to be adopted on a _per-package_ and or _per-environment_ basis. For example, perhaps I add the conda-forge channel as a lower-priority channel, but I actually prefer one of its packages to the one provided in default. There should be a way to specify that priority preference and persist it across updates and later installations.
This is the kind of thinking that would be very helpful for me personally. We really do want conda to be adopted more widely---heck, we'd be pleased if someone built their own Python distribution that used conda as a packaging model. And we'd like to find ways to enable groups like Conda Forge to flourish without having to wait on us. I actually do think that there are some changes to conda that we can push through in the short term that will greatly improve our ability to work in parallel.
Having a way to specify channel preference globally or per env would be a really good addition to conda.
:+1:
I would really rather we tackle the channel collision problem _correctly_ than to utilize weird version numbers or (even worse) track_features to disambiguate.
If we could get Continuum to open recipes (or adopt community recipes upon
submission, and then open those up), then perhaps much of the problem can
be avoided? Ideally, we do not want multiple versions of numpy with the
same version tag floating around.
An alternative path is to build everything you need into your own channel.
For mixed channels, I don't see a straightforward way of resolving what to
install without additional meta-data. In Debian there is the concept of
"pinning", which allows you to fix certain packages in place.
@mcg1969: absolutely -- but we need help from conda itself to do it "right" -- are you speaking for continuum, hard to tell :-)
@ocefpaf wrote: "Having a way to specify channel preference globally or per env would be a really good addition to conda."
I think that would actually be a simple solution that would mostly solve the problem at hand -- folks could put the IOS channel, or conda-forge channel at their first preference, and then they would get the latest and greatest.
Granted, the default channel may get updated in a way that leapfrogs conda-forge, but I think it will be up to whoever is maintaining the conda-forge package to keep an eye on that.
And the default channel is clearly the upstream one -- conda-forge will be following its lead, so that could work.
I've been thinking a bit about how this works with PyPi (and PyPi does work well, for the things i works well for, i.e. pure-python packages)
It is a totally different model -- PyPa only provides the infrastructure -- each and every package is maintained by individual package maintainers. Ideally, conda packaging could go that way, but it's going to be a long time (or never) before package authors in general support conda. (never mind non-python stuff....)
By my idea, at least, is that conda-forge becomes the PyPi-like place for conda packages -- it will start (has started) with groups of packages that are not in the default channel being maintained by a third party, but hopefully individual package authors will start to maintain their own packages. So we need to design the infrastructure to support that.
In fact, as a package author steps up to maintain a package, maybe it could even be removed from the default channel. In the long run, maybe continuum will need to maintain few packages, and rather, have Anaconda be the "curated" selection, but much of it would be pulled in from the authors' builds (OK, maybe that's a fantasy).
Anyway, what all this means is that it should be very easy for a package author to push builds to conda-forge, like it is now with PyPi being integrated into the PyPa stack (distutils, setuptools, pip, I"ve lost track...)
One easy idea would be to add somethign like this:
conda install -c conda-forge --pin-channel matplotlib
That would add an entry to the config file that matplotlib should be taken from the conda-forge and all other packages with the same name form other channels should be discarded (e.g. simple add a step-0 to the solver which removes all matplotlib packages from other channels from the list of available packages).
This will help with the problem of "fixing" packages in the default channel (and IMO this should be the only part where conda-forge should package packages in the default channel).
Another step would be to configure the "default" channel, so that conda does not see the anaconda/Continuum packages at all. Not sure if that is possible today?
@stefanv: Absolutely!
If the default channel was built from (mostly) recipes maintained in a public gitHub project(s), it would be monstrously easier to keep everything up to date and in-sync. We could/would do a lot of the work for continuum.
And they could start one package at a time (shapely?).. it wouldn't have to be a wholesale, all at once move.
I can imagine it's inertia more than anything else that's prevented this from happening so far, but it's a bit frustrating from outside.
-CHB
If we could get Continuum to open recipes (or adopt community recipes upon
submission, and then open those up), then perhaps much of the problem can
be avoided?
I think it would already be enough if Continuum would add all their package recipes (as they are currently used -> the matplotlib recipe in the conda-recipes repo is out of date) and accept PRs for already included packages. I would be happy to add patches there if I know that they land on my HD a day after they are merged... Contiuum still would have the final say, it would speed up the updates on new upstream releases, and Continuum would have less work... (cc: @mcg1969 :-) )
@mcg1969 articulated many of my thoughts more coherently than I would have, I think channel-level precedence is the probably the right way to fix this, but I would like a way to control (maybe at the package level) if it goes with newest-possible or prefers a specific channel.
I was also thinking of the debian idea of 'pinning' as a model for how to do this.
For day-job we have been making aggressive use of 'postN' versioning (pulled directly from git describe via versioneer) which helps the case where the issue is fixes from up-stream project is adding/fixing things. Although, this can get funny if you are packing commits from side-branches and definitely does not help if the difference is different sets of locally applied patches or build configuration.
Pinning channel might be ok as long as it is per environment.
@mcg1969 articulated many of my thoughts more coherently than I would have, I think channel-level precedence is the probably the right way to fix this, but I would like a way to control (maybe at the package level) if it goes with newest-possible or prefers a specific channel.
@tacaswell my Linux distro (OpenSUSE) does exactly that. I can set repository preferences that will be used when updating the system, but I can also do a "distribution upgrade" that will get the newest-possible from all repositories. This operations issues warnings stating that the user is responsible for the system stability when adding third party repository and performing the distribution upgrade. Conda is dangerously confusing with that! I can see tons of users breaking their system using conda-forge packages and going to Continuum mailing list to complain.
With that said. This behavior might be a long-term goal. Right now I believe that a global channel preference is already a big win.
Pinning channel might be ok as long as it is per environment.
Agreed.
For an idea on how to mark package/channel combinations as good/bad: https://github.com/conda/conda/issues/2067
We really do want conda to be adopted more widely---heck, we'd be pleased if someone built their own Python distribution that used conda as a packaging model.
That sounds like a challenge! Accepted, acpd, Another Conda-base Python Distribution.
On that topic and related to packaging software already in the default conda channel, would it be possible for someone from Continuum to clarify the license of the recipes in the ContinuumIO/anaconda-recipes repository? A number of those are prime candidates for use in conda-forge.
Nice! Do you have any thoughts on their integration into conda-forge, @jjhelmus?
Long term I think they could be integrated into conda-forge, but first some logistics need to be worked out. Having a conda-forge version of conda clobber the Continuum version would not be good.
Maybe they could be placed under a special label that is different from main so they could be opted into instead of installed by default when adding conda-forge.
I could be wrong but I do not think having a non main label is taken into account when doing a conda install from Anaconda.org. Having a separate channel for these packages might be a possibility but it seem liked the consensus in this issue was that this was not an ideal situation.
Anaconda-recipes is BSD, same as Conda-recipes. Sorry that wasn't posted. I have added it. It is our intent to move everything in anaconda-recipes to the conda-forge-feeding-community-channel plan. If anyone would like to help in that effort, we'd appreciate it. The gist of that plan is:
Great, thanks for the clarification @msarahan
Thanks @msarahan.
It is our intent to move everything in anaconda-recipes to the conda-forge-feeding-community-channel plan.
That is very encouraging! I think this should help in alleviating the strain of package maintenance on many fronts moving forward. Will the rest of the currently close sourced recipes be open sourced at some point in order to aid in this movement?
We move recipes from the internal repo, anaconda-recipes, and conda-recipes to conda-forge. Those other places are either shut down or replaced with links or git submodules (like the "feedstocks" repo in conda-forge)
Either of these sounds like a good plan. I suppose submodules are appropriate as part of the transition given this will take some time.
We mirror or link packages built by conda-forge on our "community" anaconda.org channel. There may be other sources of packages there, also.
At what point do you envision this happening? Should it be delayed until most of the base anaconda recipes packages are added?
We run a validation process on packages (inspect recipe, run test suites, verify package contents against a build of the recipe). Once verified, these packages will become the content on the default channel.
Will this verification process be done in the open? For instance, will the scripts for this verification be placed in a public repo? I imagine that it will be nice to include these checks in the process of determining whether a package gets added on this side, as well.
Will the rest of the currently close sourced recipes be open sourced at some point in order to aid in this movement?
Yes, but not as a huge dump. We have to convert many of these recipes from the older format, so it'll be on a case-by-case basis, or as we have time. Generally, the policy I'm following is that if I'm doing anything to fix or update a recipe, it gets translated and transferred (case in point: psycopg2 and dependencies). Not everyone is following this policy, though.
I suppose submodules are appropriate as part of the transition given this will take some time.
Submodules also have some precedence from conda-forge, which is nice.
We mirror or link packages built by conda-forge on our "community" anaconda.org channel. There may be other sources of packages there, also.
At what point do you envision this happening? Should it be delayed until most of the base anaconda recipes packages are added?
The community channel mirroring can and should be happening now. I don't see any reason why it shouldn't (aside from perhaps channel priority confusion). We have mirrored some channels in the past, but we need to get the infrastructure to implement this properly. Right now it is a cron job driving some anaconda.org API stuff that @mcg1969 wrote. The validation and use of community-built packages in the default channel is what will take time. Completely arbitrary time estimate until community packages are replacing some continuum internal builds: 1-2 months?
Will this verification process be done in the open? For instance, will the scripts for this verification be placed in a public repo? I imagine that it will be nice to include these checks in the process of determining whether a package gets added on this side, as well.
Great idea. There is no security in obscurity, and I would appreciate all of your support in developing this process / tool. I will come up with a repo for it tomorrow after some discussion internally on where it fits.
On Once verified, these packages will become the content on the default channel.
This sounds like the end game is to have the default channel host all
continuum packages, as well as all community-maintained packages.
So will there be any place for a "community" or "cons-forge" channel at all?
On the one hand -- great for the user community. On the other hand, I
have a hard-to-define impression that there will still be a need for
"something" in between the default channel and a random scattering of
channels on Anaconda.org.
I.e. Multiple levels of "trust":
Anaconda: tested by continuum and all known to work together.
Default: tested by continuum, with the lastest and greatest.
Community: curated by a trusted community, maybe experimental builds,
release candidates, etc.
Arbitrary Anaconda.org: Buyer beware !
-CHB
Your hierarchy sounds pretty reasonable, and very much in line with what I have in mind. The default channel is less "hosting all continuum packages" and more "hosting continuum-verified community-maintained packages." Continuum is participating in the maintenance as well, not just pawning it off. However, it is a reduction of Continuum's role as "authoritative builder" to "tester/verifier/integrator of packages built by a standard community-accessible system." The default channel still involves human verification on our end, and will trail the community channel to some extent. More importantly, the community channel's critical place is an aggregator, where a single central channel combines authoritative packages from multiple other channels. This hopefully will help all kinds of package conflict and channel priority issues.
I see arbitrary anaconda.org less as "buyer beware" and more as "YMMV." If small channels want to play ball with standards and all that, they'll be very welcome in the community channel. A very easy way to do that would be to just contribute packages to conda-forge.
Sounds great -- looking forward to it! TIme to get some of my stuff in
conda-forge!
I see arbitrary anaconda.org less as "buyer beware" and more as "YMMV."
well, sure -- the point was that if you use an arbitrary anaconda.org channel,
it is up to you to confirm who built it, whether it suites your needs, and
whether it is safe.
I haven't heard of any obuses, but one certainly COULD put all kinds of
dangerous software up on an anaconda.org channel -- makes me nervous!
-CHB
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.[email protected]
Yes, but not as a huge dump. We have to convert many of these recipes from the older format, so it'll be on a case-by-case basis, or as we have time. Generally, the policy I'm following is that if I'm doing anything to fix or update a recipe, it gets translated and transferred (case in point: psycopg2 and dependencies). Not everyone is following this policy, though.
Sure, makes sense. Hopefully others will follow.
Submodules also have some precedence from conda-forge, which is nice.
The feedstocks repo is pretty handy for this, as well.
The community channel mirroring can and should be happening now. I don't see any reason why it shouldn't (aside from perhaps channel priority confusion). We have mirrored some channels in the past, but we need to get the infrastructure to implement this properly. Right now it is a cron job driving some anaconda.org API stuff that @mcg1969 wrote. The validation and use of community-built packages in the default channel is what will take time. Completely arbitrary time estimate until community packages are replacing some continuum internal builds: 1-2 months?
This all sounds reasonable. Will there eventually be a more proper form of mirroring with Anaconda channels? It seems like that can be a pretty useful feature to get some select packages mirrored into one channel as opposed to having several channels. Also, could be helpful for reducing collisions.
Great idea. There is no security in obscurity, and I would appreciate all of your support in developing this process / tool. I will come up with a repo for it tomorrow after some discussion internally on where it fits.
Sounds good. Feel free to ping me when it is up. Would be nice to get a feeling as to the requirements for validation to start.
cc @patricksnape
Thanks for cc'ing me @jakirkham. This sounds amazing. It would be really really good to have a sort of staggered release schedule whereby the community thinks things are good to ship onto conda-forge and then continuum eventually pulls them onto mainline once verified. This will be great for much smaller packages too.
If we decide on the protocol for all of this I'd be really happy to evangelise it by writing some blog posts/documentation about how package maintainers can easily opt into conda in a similar manner as they do to PyPi but via conda-forge!
I suppose I should start submitting some of the recipes I have like opencv that will likely be widely useful! I need to get a feel for how conda-forge works yet but I'd be really happy to move away from hosting my own packages for other big projects if possible.
Not sure if this is up your alley, @hajs, but I figured you might be interested in this sort of system and with your wide breadth of recipes we would certainly appreciate your feedback going forward and any contributions you would be willing to me.
Just a quick comment: it might be worth studying the way the Fedora/rhel/centos ecosystem works in detail for inspiration, since it sounds like you might be moving towards reinventing large parts of it :-)
Glad to see you over here, @njsmith.
it might be worth studying the way the Fedora/rhel/centos ecosystem works in detail for inspiration
Great point. This isn't true in all cases, but many conda-recipes borrowed their build strategies from Linux distros. For instance, the fftw recipe was inspired by the Arch Linux build. The gcc recipe was inspired by various Linux distros and Linuxbrew. Though it certainly doesn't hurt to review our premises as we go through this. Also, it is worth checking if we are really doing the best thing compared to other package managers.
...since it sounds like you might be moving towards reinventing large parts of it :-)
What can we say? We are unreasonable people. We want a package manager that is cross platform, doesn't require sudo, and let's Python be awesome. :smile:
We want a package manager that is cross platform, doesn't require sudo, and let's Python be awesome.
I am stealing that phrase next time I have to prepare a presentation on conda/conda-forge :wink:
Thanks, @njsmith - I would do well to study that, since I have been proposing much of this process. It makes perfect sense now that you mention it, but I've had my head stuck in Windows and Ubuntu sand for too long to be aware of it. I'll go study.
@jakirkham and @ocefpaf, I believe his comment is aimed at the process of community-developed packages feeding into an enterprise system (which then may spawn a community-led enterprise system), more than he is talking about build strategies of any given recipe.
@sstirlin, given your views on packaging and valuing of conda not to mention your extensive recipe collection, we would be really interested in working with you to help get this all integrated into conda-forge. It will definitely help us and should help you reduce your maintenance burden. Please let us know how we can help.
process of community-developed packages feeding into an enterprise system (which then may spawn a community-led enterprise system), more than he is talking about build strategies of any given recipe.
Exactly -- the social technology, not the code technology :-).
Exactly -- the social technology, not the code technology :-).
Ok, thanks for clarifying.
I believe his comment is aimed at the process of community-developed packages feeding into an enterprise system...
This much I follow.
...(which then may spawn a community-led enterprise system)...
Could you elaborate on this point a bit more, @msarahan and/or @njsmith?
...more than he is talking about build strategies of any given recipe.
This is clear.
centos:fedora:RHEL :: conda-forge:anaconda:anaconda-enterprise
Could you elaborate on this point a bit more, @msarahan and/or @njsmith?
I don't know that conda would necessarily want to follow the exact evolution of the RH ecosystem, but Fedora, RHEL, and CentOS are 3 closely related systems that share lots of engineering work but make different kinds of trade-offs and target different market segments -- Fedora as a fast-moving project with an emphasis on free software ideals and community-led-governance (but with support from RH for infrastructure like servers and legal compliance), that also serves as a beta testing ground for RHEL; RHEL as the commercially supported ultra-stable enterprise platform, and CentOS as a free version of RHEL without the commercial support for those who want a slow-moving enterprise-y product and to take advantage of RH's QA, but don't want to pay for it. (There's definitely a desire in the community a "CentOS Anaconda" -- cf. the repeated complaints about the closed build recipes for core packages.) All together these form a neat ecosystem in which the different parse support each other -- e.g. CentOS might sound like a competitor to RHEL, but in fact it's RH themselves manage it and see it as a kind of loss leader that gets people into their ecosystem, makes it easier for third-parties to support RHEL (see e.g. the use of CentOS docker images for building Anaconda packages!), and soaks up the cheap customers so that RHEL can focus on the much more lucrative enterprise market.
...I'm actually a bit surprised that I haven't heard anything about Continuum aggressively trying to poach members of RH senior management, the business model parallels are really strong both in terms of the specific distro stuff + the emphasis on contributing upstream to community OSS projects, and RH is the best in the world at making that business model work both on the money side and the community side :-). Maybe (probably) I'm just not in those conversations...
Ok, thanks for clarifying.
I'm actually a bit surprised that I haven't heard anything about Continuum aggressively trying to poach members of RH senior management...
If they weren't thinking it before... :)
@jakirkham This is definitely an interesting project. You're really taking it to the next level. I'm intrigued.
Quick question: how are you handling different recipes for different versions? For example, the recipe to build cmake 3.5.0 will be different than the recipe to build cmake 3.3.2.
I maintain a separate recipe for each version. Sometimes I even need separate recipes to build against numpy 1.8 vs 1.9 vs 1.10.
cc @stuarteberg @ukoethe
Quick question: how are you handling different recipes for different versions? For example, the recipe to build cmake 3.5.0 will be different than the recipe to build cmake 3.3.2.
I maintain a separate recipe for each version. Sometimes I even need separate recipes to build against numpy 1.8 vs 1.9 vs 1.10.
That's yet to be _fully_ fleshed out, the phraseology to date has been "one recipe, one repository", but from the very beginning in my head this has really been more like "one package, one repository". In precisely the same way as one would manage two versions of the same software in a repo, we can manage two versions of a recipe in a repo - with branches. GDAL-feedstock was the first feedstock to make use of branches in this way, and in truth we haven't yet followed that through into the infrastructure (e.g. are the maintainers of the feedstock the union of the maintainers in the various recipes etc.).
I opened an issue ( https://github.com/conda-forge/conda-forge.github.io/issues/50 ) to discuss the versioning point more and come up with a standard for solving this kind of problem.
If there is a (very out-of-date) recipe for a package currently in conda-recipes do you want me to leave a note there when I add it to conda-forge?
I have tried to answer your question here, @mwcraig, because it feels like a policy/community direction question that is closely related to the transition and where we go from here. So, didn't want it to get lost in some unrelated merged PR. Sorry it has gotten so long. It just got me thinking about how we move forward. :smile:
Here are my thoughts on it and some related things to this transition. Other people may have thoughts on this, as well. It would be good if we can figure out the right way forward on the movement of recipes here (from conda-recipes and possibly other sources) and how to provide that information to others (particularly in terms of volume).
When I move and update a recipe from conda-recipes (or anaconda-recipes) to here, I try to follow these guidelines. As part of that, I notify people who have modified the history of the recipe because they may be interested in the package it builds as they may be using it as a dependency for something. In addition, I may try to notify a core maintainer of the project or so. This process helps to generally increase awareness about conda-forge (in some cases conda) and what we are trying to do here. Also, it allows people to become aware of how the package management ecosystem in conda is changing. Finally, it gives people an opportunity to take a larger role in how packages that are important to them get distributed either by becoming a maintainer, simply submitting patches to improve the build, filing issues about how the package can be improved, or (in the case of core maintainers) notifications about when new releases are coming out so that we can get them in here quickly. All of these help improve the package management ecosystem here, which should in turn benefit the community.
Admittedly, the strategy above (of notifying a few potentially interested people) is good for slow, but consistent growth. At this point, we have 165 packages maintained here (at least according to the conda-forge channel) and it is continuing to grow. We now have 31 members. Some from Continuum, some from the Python community with various interests, a few have little if nothing to do with Python, but have become interested as this transition has occurred. This allows us to continue to fine tune the performance of our infrastructure (something we have been doing a fair bit of), experiment with things (e.g. alternative Python distributions, use of various compiler features, etc.), and discuss various approaches to interesting and challenging problems in our unique form of package management (e.g. compiler optimizations, runtime selection of AVX and SSE optimizations, API implementation selection, etc.). However, our rate of growth hasn't forced us to make hard decision on these without taking time to consider the options and how best they might be approached. While the right rate of growth is certainly up for discussion, IMHO we are growing at a reasonable rate.
The reason I mention the rate of growth here is it affects how we de-dup conda-recipes. Namely different strategies for de-duping will have different affects on how quickly our community grows. That being said, we should probably figure out how de-duping is going to occur between here and conda-recipes as maintaining two versions is of no benefit to anyone and a bit confusing too. Here are some options that have been considered and some other ones I am now thinking of, which I have cobbled together into a rough plan that would happen over time (though feedback is definitely welcome and by no means am I saying we need to commit to this). Maybe some combination of these is the right solution. There are probably more, as well.
By doing, (1) the user is made aware on a per recipe basis that we have shifted it over and that further changes should be made here. While this helps its a bit localized and doesn't address the numerous PRs being added to conda-recipes for new packages. Combined with the existing pings for this movement, it should draw around the same number of people here maybe a few more (who were going to make some modification to it).
By doing (2), namely informing users that they should be adding new packages to conda-forge not conda-recipes. This gives them a good chance of getting binaries (something they likely want) even on platforms they may be unable to build on themselves. The low barrier to entry will be particular nice for them to do this. However, we need to keep our eyes open for abandonment. Having a large swath of unmaintained recipes is bad for everyone. This would definitely increase traffic (at least of those that read :wink:). So, we probably want to make sure things are mostly settled down (a significant chunk of the packages have moved guessing half maybe a little less) here before we explore that.
Doing (3) is a bit tricky (which I will explain), but is to replace deprecated recipes with feedstocks as git submodules. As feedstocks don't have recipes in the top level directory, but one below it makes them a little difficult to use in recursive builds. If we can tweak conda build to correct for this issue then (3) will be more reasonable. This may seem redundant compared to the other steps (particular 1) as this (3) is the biggest attention grabber that suggests things have moved and immediately links the user to their new location. Though that combined with the technical issues is a reason to hold off on it until we are ready for that level of traffic.
Finally, at some point, we may want to eliminate conda-recipes (4). However, this may depend on whether (3) can be accomplished successfully and how confusing it is. We will need to have some sort of deprecation notice on the conda-recipe's Readme. Anyone that we wouldn't have gotten will be here, so things should be pretty stable at that point and we may already have most of conda-recipes here.
This all up for discussion and none of it is set in stone. Though it is something that I felt like sharing for discussion. We need to deprecate conda-recipes, but we need to do it with an eye towards how well we can saturate the demand here.
Thoughts? Questions? Feedback? Is it all totally wrong? :stuck_out_tongue_winking_eye:
Thoughts? Questions? Feedback? Is it all totally wrong? :stuck_out_tongue_winking_eye:
That was a long comment! :smile:
I completely agree with the growth - you've been an invaluable ambassador for conda-forge over the last few weeks, and many of the (IMHO impressive) 31 contributors are down in no small part to you :+1:
I'd like to explore option 1 some more, as I think that is the only way we can truly maintain community recipes which are tested on the platforms they claim to work for.
While thinking about this on my own before I found time to read your comment and the guidelines, I was leaning towards a request to add the package of interest (astropy) here and make a simultaneous pull request to delete the astropy repo in conda-recipes, which is very badly out of date (its version is 0.2.x and astropy is up to 1.1.X).
I could see adding a deprecation note instead to the astropy; the broader question about transitioning is more difficult.
Once we are confident the infrastructure can scale I think an announcement to the conda and anaconda email lists from someone at Continuum indicating the Future of Conda Recipe Hosting would be helpful, with the eventual elimination of conda-recipes the end goal. A dashboard like the one at https://conda-forge.github.io//feedstocks.html could be used to point people to the correct repo for a particular package.
Part of the transition should include, at some point, turning off new PRs to conda-recipes, and getting the currently open PRs there either merged before migrating recipes or migrating the PRs.
In terms of the options you laid out I'm advocating for (1) short term, followed by (2) once we know what scales here.
Once a recipe works here I'd be inclined to delete it in conda-recipes, or replace the recipe there with a meta.yaml that just contains a link to the feedstock. A submodule would work too -- I don't know how widely conda-recipes is used for building large sets of packages.
Eventually (4) is necessary, I think. Given enough lead time (6 months or a year?) it shouldn't cause much disruption.
I don't think killing conda-recipes is the right way. Its contents should definitely live elsewhere, but conda-recipes itself is an important aggregation, and contains more than just Python packages (which is the primary focus of conda forge at the moment). Conda recipes could also serve to collect recipes (submodules) from sources other than conda-forge, if any project wants to maintain their recipe themselves, outside of conda-forge. I'm in favor of 1 and 2 now, with 3 (with conda-build fixes) down the road a bit.
Given your thoughts on channel de-duping, I was curious if you had any thoughts on this, @mcg1969?
Deduping recipe in repos is unfortunately not solving the problem of binary deduping when conda-forge includes a package from the default channel (e.g. matplotlib) or in the future when the default channel gets feed builds (or better recipes) from conda-forge. IMO this is a problem because if this is not coordinated, it will at least lead to hard to track bugreports when it's not clear which version of a package is installed (the mpl packages have AFAIK currently different dependencies in default and conda-forge). In the worst case, it will lead to incompatible packages.
I think the default should be to install from default, so unless something drastic comes up (= bug in default), packages with the same upstream version should get installed from default, but higher upstream versions should be prefered from wherever they come from.
Therefore I would like to propose this scheme (essentially the debian backport scheme):
Append
"cf"to the build-string of all packages per default (= manual work :-(). If there is a reason to prefer the package from the conda-forge channel, then the build string should be changed to1cf(or1.cf?), if1would be the next build string in the default channel. Conda-forge internal builds increment by appending a number:cf1.
This would result in the following behaviour:
For this to work, all package which have a package in default and in conda-forge need to have the recipe of the default channel available otherwise the same problem as with the current mpl situation arise...
So another "policy" would need to be:
One of the recipe versions is "upstream": the "taker" should only modify the build-string and add patches to fix bugs in the package but not change the "spirit" of the recipe (e.g. remove/add dependencies to change functionality). Bigger changes should be done in the "upstream" repository.
Some examples:
| package | default | conda forge | default | explanation |
| --- | --- | --- | --- | --- |
| matplotlib | 1.5.0 | 1.5.1.cf | -- | just the conda-forge copy of matplotlib to get new upstream versions earlier -> when defaultcatches up, the defaultchannel is prefered |
| matplotlib | 1.5.1 | 1.5.1.cf1 | -- | defaultcatches up, but has a different recipe -> conda-forge needs to release a new package to catch up -> 1 after cf |
| matplotlib | 1.5.1 | 1.5.1.1cf | -- | A fix for the package in default (1 in front of cf), conda-forge is prefered until default has a new version (either upstream or with a build string) |
| whatever | --- | 1.1.cf | 1.1 | default gets the recipe from conda-forge and removes the cf build line and the package is sorted higher and is now installed from default |
After reading all the various suggestions which involve name mangling and custom version numbers, I'm thinking that @janschulz's original suggestion of having a separate channel for conda-forge packages which are also present in default channel seems like a great solution. If we moved all the duplicated packages into a new channel, say conda-forge-core, then users would need to explicitly add that channel or specify it in a conda install command.
If we moved all the duplicated packages into a new channel, say conda-forge-core, then users would need to explicitly add that channel or specify it in a conda install command.
I don't like the idea of more channels. IMO our goal should be quite the opposite: improve the communication to get fixes/updates/patches from conda-forge into the default channel. We don't have a concrete example of that happening right now, but @msarahan and others are present here and monitoring the activity. I see that as a win.
We do have a different problem regarding same package and version/build number. I think that must be fixed in conda. All we can do for ow is to bump our build number to a higher value than the default channel to avoid conflicts.
I am closing this issue as I believe we already know what to do when submitting a package that is already in the default channel. Just write the reason why are you submitting the package to conda-forge in the PR (e.g.: new patch to solve X, missing dependencies, latest version, etc).
@ocefpaf This solution is not enough when continuum starts to import packages form conda-forge into default.
Please see this PR ( https://github.com/conda/conda/pull/2323 ), which is trying to better address channel conflicts.
@ocefpaf This solution is not enough when continuum starts to import packages form conda-forge into default.
Why not? If they keep up the pace we can just drop our version. If not we can keep on releasing and hope that conda/conda#2323 will allow them to live happy together.
Because packages will end up with different things in them but exactly the same version numbers (as long as continuum does not start importing the binary packages, which IMO is not a good idea as they would have to trust every member of an org with now already 41 people in it). This will lead to things like one version having a fixed openssl included and the other version not simple because of the time when the versions were build. It might only happen a few times per package but when conda-forge has ~1000 packages, this adds up due to maintainance burden for hard to debug situations.
If this gets worse by having two different packages in these two channels (as it is currently--or at least can be--the case with the mpl package), this results in an even greater nightmare...
I don't say the above enhancement is bad: it's actually great, but I think it's more addressing the problem of having a user channel and overwriting packages in the default channel with other ones and not the problem of two versions having (almost) the same metadata.
A completely technical solution to the above problem would be if the build string could be split up into three parts: old_build_number + setting from environment + new_one. A repackager/ taker can only touch the new_one (apart from new upstreams or bugfixes), the original recipe only touches the old_number and the condaforge scripts set an environment variable which sets the middle to cf and continuum does not set it at all. On build they get mangled into the normal build string which then implements the scheme above. This would ensure that if a user has both channels included, they would get the "right" package (=whoever has the higher upstream version and on same upstream version, the default channel wins). This happens without user intervention via pinning.
And you can see on first glance what packages came from the conda-forge channel, even if the user downloaded and installed the package manually.
Because packages will end up with different things in them but exactly the same version numbers
I understand that problem and I don't think it is any different from the Linux distro repositories problem. And this is how they solved it: A big warning to any user that is adding any third party repository. (I think that continuum is really far behind in doing that btw :wink: )
Together with the warning they provide ways to choose preference repo order, freeze a package to a repo, or freeze a package from any updates.
but I think it's more addressing the problem of having a user channel and overwriting packages in the default channel with other ones and not the problem of two versions having (almost) the same metadata.
I did not take a close at conda/conda#2323 to comment. However, I disagree that the packages have the same metadata. The origin is different and that is part of the metadata. (The most important part IMO.) I think that build strings are redundant and the technical use you recommend will create an unnecessary complexity.
fair enough :-) If it becomes a problem in the future, it can be solved then...
Because packages will end up with different things in them but exactly the
same version numbers
This is basically a problem you will get as long as there is more than
source for the same package.
But if continuum pays attention to what we are doing ( and they do seem to
be), than they can increment the build number, and we're good to go.
Not a technical solution, but what can you do?
Also, if the default channel continues to be prioritized, then even if
there are duplicate build numbers, users will get the "official" version be
default, which is probably good, and at least predictable.
-CHB
I understand that problem and I don't think it is any different from the
Linux distro repositories problem. And this is how they solved it: A big
warning to any user that is adding any third party repository. (I think
that continuum is really far behind in doing that btw [image: :wink:] )
Together with the warning they provide ways to choose preference repo
order, freeze a package to a repo, or freeze a package from any updates.
but I think it's more addressing the problem of having a user channel and
overwriting packages in the default channel with other ones and not the
problem of two versions having (almost) the same metadata.
I did not take a close at conda/conda#2323
https://github.com/conda/conda/pull/2323 to comment. However, I disagree
that the packages have the same metadata. The origin is different and that
_is_ part of the metadata. (The most important part IMO.) I think that
build strings are redundant and the technical use you recommend will create
an unnecessary complexity.
โ
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/conda-forge/conda-forge.github.io/issues/22#issuecomment-205012788
Yes, please do offer feedback on https://github.com/conda/conda/pull/2323 . It is subject to improvement---both before we merge it and after. But conda-forge is definitely one of the reasons that PR was built.
My view of build strings is that they serve exactly one purpose: to prevent duplicate filenames---and, in doing so, to allow users to specify a specific build of a package when they need to. I don't think it is a good idea to endow them with any semantic content that the underlying solver must depend upon.
We should be relying on channels (now that 2323 is in the pipeline), dependency differences, and features to achieve differentiation. And if those are insufficient, we should come up with new metadata approaches. But the filename itself should be irrelevant to the solver.
That's not to say that the build string and filename can't be built _from_ the metadata, however.
Because packages will end up with different things in them but exactly the same version numbers
I didn't follow the discussion in detail, but would like to point out that it is possible to add version tags to version numbers in order to disambiguate variants, like foo-1.2.3.tag1.propA vs. foo-1.2.3.tag2.propB. Other packages can use these to refine requirements:
requirements:
run:
- foo *.tag1* # won't pick up *.tag2*
I don't claim that this is necessarily a good solution, but it might be another useful trick to address the ambiguity problem. The good thing about these tags is that one can take advantage of conda's powerful version comparison and resolution algorithms.
I think that channels and subchannels in particular will become very powerful once something like 2323 is implemented. I think that may be the proper way to host multiple variants of the same package.
I think that channels and subchannels in particular will become very powerful once something like 2323 is implemented. I think that may be the proper way to host multiple variants of the same package.
:+1:
Hmm...not sure I see how subchannels work or how that will fit into our infrastructure yet.
@gpilab, I noticed your channel recently and noticed that we have a lot of overlap in terms of packages we provide. Maybe you would be interested in getting packages from conda-forge. Also, as those packages are some of your dependencies, maybe being added as maintainers to the would be useful to you. I would be really interested in helping you figure your way around conda-forge. Feel free to give me a ping. :smile:
@NLeSC @remenska, noticed that you have a variety of interesting packages some present here and some not yet present (though we are eager to add). Given this is quickly becoming the place to get packages that may not yet be packaged by Continuum and we do the builds in automated VMs in very clean environments, I think you might benefit by adding some of your packages here. Also, feel free to sign up for packages that are valuable to your effort. If you need any help figuring out what is going on, please feel free to ping me and I will be happy to get you started. :smile:
I am not catch up on all this discussion around conda-forge so I am not sure if this is the best place to do it sorry if not. I am very exited to see progress on this, great work!
With the new conda constructor its really easy to make a custom conda distribution with custom packages from a conda channel. I just tested it with a file like this:
name: centonda
version: 1.0.0
channels:
- http://repo.continuum.io/pkgs/free/
- https://conda.anaconda.org/conda-forge
specs:
- python
- conda
- anyjson
At the moment you still need http://repo.continuum.io/pkgs/free/ in the channels list to have python and conda but you can see the idea, if these packages are on the conda-forge channel it would be possible to create a distribution with community created packages.
It would also be possible to make that custom distribution point to the conda-forge channel by default. Not as straight forward but possible, see https://github.com/conda/constructor/issues/16.
Just wanted to mention this as a possibility because I haven't seen anybody discuss this option.
Neat idea @danielfrg. I have scripts which already make the self extracting tarballs (such as miniconda is for Linux and OSX) but not a windows installer. ๐
I think this probably deserves its own issue in this repo though. Happy to open it?
if these packages are on the conda-forge channel it would be possible to create a distribution with community created packages.
Why wouldn't they be - this is a community packaging project ๐ ๐
I opened a new issue in https://github.com/conda-forge/conda-forge.github.io/issues/90 for tracking.
Why wouldn't they be - this is a community packaging project ๐ ๐
Definitely! Thats what I meant, a distribution with only community created packages, all open :)
all open :)
To be fair, this repo does now contain all of the anaconda recipes which are in the conda-build form: https://github.com/ContinuumIO/anaconda-recipes
But I still don't know if that is the canonical repository...
@pkgw, noticed that you have a variety of interesting packages some present here and some not yet present (though we are eager to add). Given this is quickly becoming the place to get packages that may not yet be packaged by Continuum and we do the builds in automated CI VMs in very clean environments, I think you might benefit by adding some of your packages here. Also, feel free to sign up for packages that are valuable to your effort. If you need any help figuring out what is going on, please feel free to ping me and I will be happy to get you started. :smile:
Most helpful comment
centos:fedora:RHEL :: conda-forge:anaconda:anaconda-enterprise