Conan: In-source vs. out-of-source recipes

Created on 13 Mar 2019 · 19Comments · Source: conan-io/conan

Hello,

There are two kinds of recipes, those which are embedded in a project, and those which belong in their own repository. I'll call the former in-source recipes and the latter out-of-source recipes.

I've been having problems with in-sources recipes lately, so I wanted to share my thoughts about them.

Out-of-source recipes

Those are the most commonly found, especially in OSS (e.g. bincrafters), and they have some strong advantages.

Every release can be packaged

It is possible to go all the way back in a project's release history, and package each release without interfering with upstream at all.

Recipe bug-fixes do not require a new upstream's release

Instead, you have to push a new recipe revision, which has no impact on upstream.

In-source recipes

The biggest advantage of in-source recipes is the possibility to develop the project and package it at the same time, which can be time saving especially when dealing with CI. Indeed, one can build, test, and package at the same time.

However, it has a big drawback:

Recipes bug-fixes require a new release

When the packaging files are part of the project, releasing a recipe bug-fix is no longer trivial.
The buggy recipe is tagged alongside the rest of the project, thus being carved into immaterial stone. Even if it has no impact on the real code!

So what can you do? Make a new release only containing the recipe fix?
It works, but it's a lot of noise for the users (not even mentioning those who do not use Conan), even more when there are more discovered recipe bugs.

Unfortunately, that's what we're doing at work at the moment.

Possible solution

I really think development and packaging should be separate processes, for the reasons I've briefly mentioned above.

The first thing I thought about was to have two recipes. An in-source one for development, used to resolve deps and build easily (i.e. local workflow), and an out-of-source one for packaging.

Unfortunately, this duplicates quite a lof of information (e.g. about requirements), which increases the risk of having mismatches between both.

And it gets worse when a project uses workspaces, because you end up with N in-source recipes...

Does someone have a magical solution for this? 😄

Feedback please! triaging whiteboard

Source

theodelrieu

👍5

Most helpful comment

I have thought about this a lot. I will try to post more later, but one thing that has come to mind as an idea (although it may be completely flawed and not work), is having a pattern where both exist, and supporting composition to eliminate the redundancy.

So, there are quite a few OSS projects that just want to have a conanfile.txt in the root of their repository which can optionally provide the dependencies of the project. They don't want it to be required, and they don't want to be in the business of packaging their own project for Conan Center. An example of this is the stlab/libraries project maintained largely by sean-parent:
https://github.com/stlab/libraries

In this case, as a community packager, I would like to package the library without losing the information in conanfile.txt. This all feels like a reasonable situation to think about.

Obviously, the high-level arrangement would be:

conanfile.py checks for conanfile.txt
--Perhaps this happens after source()? Probably won't work.
--Perhaps the conanfile.txt needs to be obtained extremely early, like in the around when python_requires takes place.

conanfile.py adopts all the attributes of conanfile.txt and somehow merges them with values defined in conanfile.py. There would obviously be a LOT of questions here around precedence and such.

Again, at a high level, this would be a hybrid approach we could think about. It may very well be impossible or infeasible to implement, I'm not sure.

solvingj on 13 Mar 2019

👍2

All 19 comments

_(My thoughts about it, I don't have an answer)_

This is a major concern, for sure.

An in-source recipe helps the author with the CI, not only packaging in the same run, but also helps to configure and run different platform configurations, better with Conan than with CI scripts. This is something very useful for library writers, also for open source ones.
But it has the big drawback you mentioned, the release cycle of the Conan recipe is not the same as the library one, so unless you consider Conan packaging something needed in your release, then having to fix anything in the recipe could require a new library version and it is annoying.

I don't have any magical solution for this, the only one idea that comes to my mind is to have external recipes, make it easy to grab the corresponding recipe in the CI and build the _local sources_ instead of using the source() method to retrieve the sources...

I feel like it was the same process when some library is moving to CMake from other build system, if it is the only build system then the realese will include proper CMake implementation, if not, any fix to the CMakeLists.txt would require a new release. Is Conan in the same path? Does CMake faced this issue before?

Please, anyone share your thoughs. Thanks!

jgsogo on 13 Mar 2019

In this case, as a community packager, I would like to package the library without losing the information in conanfile.txt. This all feels like a reasonable situation to think about.

Obviously, the high-level arrangement would be:

conanfile.py adopts all the attributes of conanfile.txt and somehow merges them with values defined in conanfile.py. There would obviously be a LOT of questions here around precedence and such.

Again, at a high level, this would be a hybrid approach we could think about. It may very well be impossible or infeasible to implement, I'm not sure.

solvingj on 13 Mar 2019

👍2

That is something that crossed through my mind too: conanfile.txt inside the repo and conanfile.py outside, and compose them... but then I thought about recipes that could be using the source() method to retrieve data from somewhere else, would it be a fair use-case or it is abusing Conan? should that additional-sources/data be packaged as a dependency?

I'm not scared about implementation, if the idea works and make sense.

jgsogo on 13 Mar 2019

This is still embryonary in my head, but I feel that a clear separation of concerns between building/CI and packaging would be really helpful.
To achieve this, it might be required to rethink the conanfiles model.

Right now, the conanfile.txt is simpler to write than a conanfile.py but is also much less flexible. I'm not convinced of its usefulness, apart from being a good "quick start" method. I've always transitioned to a conanfile.py in the end, but YMMV.

The conanfile.py is a two-headed beast, and it might be useful to split it between the in-source kind and out-of-source kind, e.g. "build helper" recipe vs. "packaging" recipe. If we manage to specify those two concepts well, we might design a good solution to the OP issue.

It might add complexity for packagers in the end, but having a clear cut separation would avoid conanfiles to become a huge monolith/mush of complex things. They already are quite complicated and have a lot of features.

@jgsogo

An in-source recipe helps the author with the CI, not only packaging in the same run, but also helps to configure and run different platform configurations, better with Conan than with CI scripts.

True, this is mainly the "build helper" aspect of Conan at work here, i.e. dependency installer, and build system wrapper.

I feel like it was the same process when some library is moving to CMake from other build system, if it is the only build system then the realese will include proper CMake implementation, if not, any fix to the CMakeLists.txt would require a new release.

If we agree that a project is minimally composed of its code and a build system responsible to build it properly, then I'd argue that an bug fix in the build system files should indeed result in a new release.

theodelrieu on 14 Mar 2019

It's an interesting idea. I have also noticed that conanfile.py is dual-purpose. It gets deps for the current build, and packages the artifacts for the downstreams. While we're currently feeling some motivation to handle these two things separately, but in a composeable way, it's worth pointing out that this is probably how most package managers work. I think it's also interesting to point out that 95% of the recipe is automating the BUILD of the library or executable. I would say the package...() methods represent about 5% of the complexity and logic in the recipe. I don't know if that's useful.

solvingj on 14 Mar 2019

it's worth pointing out that this is probably how most package managers work

It would be interesting to look at how other PMs handle this issue (if they do).

theodelrieu on 14 Mar 2019

Maven/Nuget/Pypi/NPM . In all cases I am aware of, you list your dependencies and define your package in the same specification. C/C++ is the rare case in which project maintainers have to leave the door open to having dependencies satisfied in a wide variety of ways, and not forcing users to get them from a single package manager.

OS Package managers (DEB/RPM/Brew) are an exception here (despite being C/C++), because they do effectively force all the deps to come from a single PM. However, that is just a very different situation.

solvingj on 14 Mar 2019

I have a bit of a radical, kind of idealist suggestion for this sort of feature, based on my own experience working with Conan in production at my company.

First, the existing conanfile.py/conanfile.txt mechanism would remain as-is and could continue to be used for packages which are not "native" to Conan, e.g. zlib, openssl, etc. They could also be used for "native" Conan packages, but this usage should be discouraged.

Instead, native packages would have something like "conan-package.yml" (or .json) that would be somewhat reminiscent of npm's package.json. This would enforce some constants: namely, it would require building with CMake and an accompanying CMakeLists.txt with a standard header similar to the include(${CMAKE_BINARY_DIR}...) one. The generated include file in this case would provide macros, functions, and variables specific to native Conan packages. Binaries, PDB files, etc would be generated in a standard, well-documented way. This allows for an incredibly easy cycle of installing dependencies, building, testing, and eventually packaging, and should reduce a lot of the headache of C++ development, especially once debugging features are more fleshed out in Conan.

Given that CMake can provide generators for many different build systems, this would provide a stable, but not too-restrictive standard for all serious C++ development in the future and would make it possible to migrate a wide variety of projects over to become "native" packages. As part of my professional work I helped migrate a very old C++ codebase over to Conan/CMake already, and it's helped enormously. Maybe someday, almost every C/C++ codebase could be structured this way...

elizagamedev on 19 Mar 2019

What you describle looks more related to what SG15 is discussing (i.e. P1177r0, P1178r0, P1204r0, P1254r0 and P1313r0).

I agree that having a somewhat minimal package specification would be very profitable. Unfortunately this will take years, and I still do not see how it would solve the OP issue, i.e. having the packaging files in-source or out-of-source.

theodelrieu on 19 Mar 2019

I've had this idea for some time, but never got to try it out on a real project, though. Thus I might be missing some points already known to veteran packagers.

An in-source package can be published from develop or master branch into wip or ci or whatever channel by CPT via some CI. Maybe even not published, CPT can just package it to see if everything works and don't upload it anywhere.

When the author feels it's a good time to release, he doesn't tag the commit, but creates a release/xxx branch. CPT detects that the branch name matches stable regex and publishes the package into stable channel.

If the author finds some bug in the recipe, he pushes a commit to master and cherry-picks it into release/xxx (and if the bug's been hiding for too long, intorelease/yyy et.cetera). If there is a bug that should be immediately released as a hotfix, he branches from release/xxx to release/xxy and cherrypicks relevant commits there.

It seems to me that both CPT features and git-flow encourage this sort of release "cycle".

Artalus on 19 Mar 2019

I agree, the suggestion here and those proposals are a bit orthogonal and long-tailed. They do relate a bit, but my current focus is the versioning questions raised here.

Regarding the versioning questions:

From my perspective, the process of packaging a library, including the long-term maintenance and revision of packaging logic, is orthogonal to the API/ABI of the version of a library. It is fundamentally flawed to try to version the two together.

Historically, for OSS packages, we have co-opted the LIBRARY version to describe the PACKAGE version for Conan packages. I think this is the right choice for now and the foreseeable future. We do this because anything more robust would appear complicated and confusing to the community, especially for a fairly young tool like Conan. The story needed to remain simple, and still does to some degree. For these reasons, we want the most common and simple cases and users to be able to work while only being aware of ONE VERSION NUMBER.

This is the case now, and it is satisfactory in terms of SIMPLICITY and USABILITY, which is the high priority to the broad user community. At the same time, it is utterly insufficient in terms of DETERMINISM and REPRODUCIBILITY, which are high priorities to the more advanced and professional user community. The challenge for Conan is satisfying the priorities of both sets of user communities and use cases. I think the package revision system in the latest version of Conan is the first of it's kind, and should be hailed as a miracle if it indeed satisfies both user communities. I am optimistic, but honestly have not had time to fully test it out in either OSS or enterprise setting.

Still, I think it's important for anyone interested in this topic to really understand the fundamental problem in detail. That is, trying to use one variable (library version) to represent the sum of two separate states (library version and package version), without any variable for tracking of the second state separately (package revision). The only way a single version could do the job is to create the separate variable for tracking package revision, and then create some new abstract "consumer version" which represented the sum two lower-level states , which incremented any time either of the other two did. Indeed, this would be convoluted and highly undesirable for OSS packaging. Again, this whole problem domain exists uniquely for native binaries (C/C++) because the contents of a binary package are nearly infinite. There are infinite number of ways and so the recipe is FAR more likely to evolve independently of the library. For these reasons, in order to satisfy requirements of DETERMINISM and REPRODUCIBILITY for the cases that need it, a separate version tracking system is required for the package/recipe. And, as stated earlier, I think Conan's new revision system implicitly handling this so that we don't have to complicate the simple cases un-necessarily is a brilliant design.

Of note, there's one community to think about which has the same problem, but on a much grander scale. That's docker container community. There, you have a "tag" system which represents the sum total of states of an infinite number of different components. This system is anything but simple, has been the topic of much discussion and debate since it's inception. In short, it's certainly not a solved problem, so I hope people do acknowledge that there's inherent complexity here that can't be reduced.

solvingj on 19 Mar 2019

@Artalus On a second thought, this is a possible solution, but it implies a lot of dedication from the project's maintainer. There might be some research to be done to reduce that burden. e.g. not putting the version in the recipe, and share as much as possible a single recipe for as many library releases you can?

@solvingj I agree that revisions are a good solution to the recipe versioning related problems.
But the OP's issue was more about the recipe's release process, rather than its versioning.
Even with revisions enabled (which I'm using at work and had no problem yet), the issue I mentioned is left unsolved.

I will try to investigate @Artalus' solution in the next days.

theodelrieu on 19 Mar 2019

Sorry, I probably should have saved that comment for another issue. Thank you @theodelrieu for the resources.

The new package revision feature is very wonderful and should certainly make reproducible builds easier, especially for out-of-source recipes. For in-source builds, however, I do not think it is necessarily required to track versioning of the recipe differently from the source, even in C/C++ projects. We've taken a cue from Cargo so that a package version is considered the sum of the recipe, source, and any other files in the package, and for any change to any content, now matter how small, a new version must be made as in the OP. I understand that this is controversial and is a bit aesthetically unpleasant, but it does work well in production and it does guarantee a good degree of reproducibility.

We've been using this method plus a CI pipeline which works pretty similarly to what @Artalus mentioned.

elizagamedev on 19 Mar 2019

There might be some research to be done to reduce that burden. e.g. not putting the version in the recipe

For starters, yes. There is FunctionalPlus library that follows this idea by getting version from Travis via version = os.getenv("TRAVIS_TAG") and Catch2 uses a pinch of magic to extract version from CMake.

Artalus on 19 Mar 2019

👍1

@elizagamedev it's good that you have found a strategy that works in your environment. In the context of enterprise environments, I don't think it's fundamentally controversial or even unpleasant. Instead, I would say that in some other enterprise environments like @theodelrieu 's and also my own, it's simply not feasible or acceptable at all due to other logistics. Also, to reiterate something I feel is important... while it may work for Cargo, this approach is completely incompatible with C/C++ OSS libraries.

@theodelrieu I felt that the versioning of the changes were the most important thing to set requirements about up front, and that any decision on where a recipe should live would depend on conclusions there. However, maybe there's more discussion and possibilities that need to occur at the same time.

The biggest advantage of in-source recipes is the possibility to develop the project and package it at the same time, which can be time saving especially when dealing with CI. Indeed, one can build, test, and package at the same time.

I think the advantages you mention here are all achievable through other strategies and scripting now. I guess this IS the right place to suggest, test, and then discuss specific ones.

Of note, the SCM feature is obviously one major feature designed around the in-source enterprise scenario, yet you haven't mentioned that. Have you explored that yet?

solvingj on 19 Mar 2019

@solvingj

Of note, the SCM feature is obviously one major feature designed around the in-source enterprise scenario, yet you haven't mentioned that. Have you explored that yet?

I haven't used it yet, but it doesn't seem like it will help solving the issues at hand. I may be wrong though, it'd be great to have feedback from someone who's using it frequently.

Thinking about in-source recipes' issues, I know from @SSE4 that one huge pain point right now is backporting recipes' fixes to previous versions, especially complex recipes like the monolithic Boost one or OpenSSL's. Backporting those is very error-prone, and often time no backport is done at all.

He told me that he wanted to try out a new approach, i.e. having one recipe for multiple library versions.
Of course, this will add complexity in the said recipe. Performing checks on versions in addition of settings/options looks scary, but I think there is something to pursue with this idea.

In the end, in-source recipes are also far from being optimal right now.

theodelrieu on 20 Mar 2019

building multiple versions from one branch has some appeal, but does not seem long-term maintainable. the recipe just grows infinitely long with condition after condition.

I don't think it's reasonable for Conan team and Conan center to maintain backporting all fixes and improvements on all libraries indefinitely. With Bincrafters Boost recipes, we go back as far as we have time to do when we backport things, and that's all. Time is finite.

Is this unfair or detrimental to the OSS community? I don't think so. If there are people in the community who needs a specific old version of a thing fixed with a backport, it's absolutely reasonable to expect them to take the fixes and submit the PR on the old version. It's the same with MinGW specific fixes and the like.

Is this unfair or detrimental to the Enterprise community? again, I don't think so. Serious business users should be cloning all the OSS recipes they use to their internal GIT server, and building and deploying their own binaries to their own internal Artifactory server. This will regularly involve patching of the recipes, and regularly be older versions than what Conan Center is really able to maintain. If they want to submit their patched recipes for older versions upstream to Conan Center... great... if not, I don't consider it a problem.

I feel that Conan's evolutions thus far have significantly informed and improved the larger ecosystem's collective comprehension about describing and managing ABI's and library metadata. I see several more compatibility breaking evolutions of Conan in the future. Personally, I would want to see the priority be on continuing to move forward technologically, and support any policies or procedures which minimize the dev teams sense of burden on backwards compatibility considerations.

solvingj on 21 Mar 2019

👍1

I agree with all these points, this is also a good way to ~~force~~ strongly suggest users to upgrade their libraries, which looks reasonable to me.

I think the advantages you mention here are all achievable through other strategies and scripting now. I guess this IS the right place to suggest, test, and then discuss specific ones.

Do you have some examples of these strategies and scripting? I think we can start talking about them, now that we sorted some things out of the way.

theodelrieu on 21 Mar 2019

Hello, friends. Since the Index is here the problem of in-source vs out-source states for it too:
https://github.com/conan-io/conan-center-index/issues/22#issuecomment-531577933

I really think development and packaging should be separate processes, for the reasons I've briefly mentioned above.

The out-of-source way is clearly the way to go for the long-term maintainability of the repository. The goal is to figure out how to find a way to solve some shortcomings with it, more importantly - the Duplication problem.
It would be good to have some understanding of what's Conan's plan in the near future to address moments with it.

@theodelrieu

Right now, the conanfile.txt is simpler to write than a conanfile.py but is also much less flexible. I'm not convinced of its usefulness, apart from being a good "quick start" method.

The long-term goal can't be anything short of reducing the complexity (one could say that the complexity is always there, just moved down the dependency graph, I take that). Looking at the experience of modern languages such as Go, Rust, Swift - we need a way to move the complexity from the end-user too. I believe there is a lot of value for the conanfile.txt to expand.

@elizagamedev brought up a very cool idea. I'd only argue that fully restricting ourselves to a single meta build system maybe not an ideal solution. The Conan's flexibility is a very huge feature, not a bug. Why not expand the conanfile.txt with base templates? It'd provide a simple for end-users to consume and at the same time flexible for tooling people to glue new tools, not just CMake. I believe the experience with https://docs.conan.io/en/latest/reference/config_files/editable_layout.html could be of much use here too.

@solvingj
The problem of DETERMINISM and REPRODUCIBILITY so far is solved with a combination of revisions and lockfiles, isn't it?
Regarding the problem of the Duplication, I believe there are three use-cases that we must look at, not two!

Dependencies. Pulling and using packages. It's the main problem when the upstream library wants to use Conan to define the list of dependencies it needs. If we want to allow the composition of it - we need to rework the Conan flow - the source function runs _very late_. Before source we wouldn't be able to say which requirements we need.
Packaging. The only problem when the upstream author wants to package their library with Conan too.
Orchestration. The development flow, workspaces and possible custom commands like conan run script 😉 . Conan allows to abstract the build system underneath. Conan allows to pull build tools too. Why not use it to the full extent?