Conan: Cache Downloads

Created on 5 Mar 2019  路  16Comments  路  Source: conan-io/conan

To help us debug your issue please explain:

  • [x] I've read the CONTRIBUTING guide.
  • [x] I've specified the Conan version, operating system version and any tool that can be relevant.
  • [x] I've explained the steps to reproduce the error or the motivation/use case of the question/suggestion.

When packaging some external libraries using conan, the sources need to get fetched each time the source() function is called.
Most often, this happens using the regular http protocol, with hashes to verify the archives.
While developing these packages and while letting these conan packages be created using the build servers (e.g. travis ci), these sources must be fetched every time.

It would be useful to add a cache option to tools.get (and/or tools.download) to allow it using a cache.
Granted, it is possible to add extra logic to cache these downloads manually, but I believe it would be better to add this ability to conan, such that this added complexity can be avoided.

Notes:

  • This would need an environment variable/config variable for setting the cache path

    • CONAN_CACHE_PATH=$HOME/conan_cache

  • I think this caching should be enabled only if a url, filename and hash are passed explicitly. This to avoid nameclashes.
  • I propose storing the cache in subfolders of $CONAN_CACHE_PATH using the hashes as subdirectories. Such that files having the same filenames, but different urls/hashes do not clash.
  • This would also need an environment variable to allow disabling the cache (for packages specific)

    • CONAN_CACHE_DISABLE=1 to disable all caching

    • CONAN_CACHE_DISABLE_qt to disable caching for the qt package.

  • conan_package_tools should also add support for this, especially when using docker such that each instance has access to the global cache. Every docker invocation needs to pass the CONAN_CACHE_PATH environment variable and mount the local folder to the docker.
  • Because this cache folder would introduce a common state between different builds, there is a possibility to abuse it to share state between builds. I think this can be ignored because the downloads themselves are shared themselves.
medium medium feature

Most helpful comment

https://github.com/conan-io/conan/pull/6287 implements a Download Cache. It is being released in Conan 1.22, hopefully today.

Considering this feature as closed, for the other related ideas and suggestions in this thread, please consider opening new specific issues for them. Thanks!

All 16 comments

hi there,
you want to build third party libraries with your CI system without the CI system always fetching sources.

source code

Do you build these libraries from source code (*.c; *.cpp; *.h; *.hpp; ...) or are those libraries prebuilt (exe, dll, so, a)?

If those are code sources then you should always fetch the sources from your source repository with the def source(self) function. From the content downloaded in the def source(self) function all builds (x86, x86_64, Windows, Linux, etc.) will be done in the def build(self) function.

prebuilt libs

If those are prebuilt libraries you normally download them in the def build(self) function and put it into the self.package_folder by e.g. unzipping them or packaging them with the def package(self) function.

no dedicated download

If you do not want to download anything all the time then you could use conan's export_sources feature. This does store the source files along with the recipe in the conan repository. Then you always have the code already "downloaded" with the recipe and cached in your local conan repository.
Be aware that the content of the "cached" sources can only be modifed/updated/... if you create or export the recipe into the repository again.

I want conan to act as a proxy server for downloading the sources.
Conan should only do this if some hash (e.g. sha256) is provided to be sure that the cached file is the same as the requested one. The cache folder is 'owned' by conan.
This is useful when downloading things in the source and build functions of conanfiles.

The rpm build toolchain does a similar thing (rpm packager cheat sheet): it downloads all sources to a SOURCES directory to be unpacked in the BUILD directory.

Hi @madebr,

Actually, the source() method is only executed once and sources are never fetch again if the recipe does not change in the following conan create calls.

There has been other people requesting this for build tools used as build requirements in CI so they are not downloaded every time a new CI job runs. However, we don't have a way to do that.

One possible workaround would be having a prepopulated Conan directory and setting the CONAN_USER_HOME to point there.

As you pointed out, the chose about when to use or when to clean the cache is something difficult and would require some kind of persistence in the CI machines as it is not something that could be done only using Conan.

The main use case of this caching is creating conan packages for third party projects (as in packaging tar balls) on ci, such as conan-center and bincrafters.
Sometimes these archives/downloads can be quite big and these get download multiple times on each ci job.
Downloading these sources again and again is often a big load on the download servers, which are sometimes rate limited.
Plus, having these archives cached also allows quicker development of the source() functions in a conanfile if the source function has to manipulate some files.

I do already something similar in my recipes, but this advantage is lost when using dockers.
See
https://github.com/madebr/conan-bullet/blob/f076bc2678b57e7757161eef7807ed3d9e8c33c7/conanfile.py#L54-L69

The problem is that we use conan-package-tools to build. Each docker container will download the package once at least. If you have some limitation with source forge, we can upload a copy to a generic Bintray repo and download from there.

@madebr still don't get the reasons of your issue...

SOurces or a package are only downloaded the first time you run a conan create, subsequent call to conan create won't download sources again unless the conanfile.py is modified (not your case).

The only time when download os sources are download again is the case when you build packages in different OS.

As @uilianries pointed out, that is the case of conan-package-tools as the builds for new configurations are launched inside docker containers, so any sharing mechanism in Conan would work. Only using directory mapping in docker and setting CONAN_USER_HOME could eventually work.

Note that you also have a flag --keep-source in the create command to prevent Conan from deleting the sources even if the recipe has changed.


Some clarifications from your last comments:

source() functions in a conanfile if the source function has to manipulate some files.

Beware that those changes in the sources will be shared among all builds, so no specific settings related changes should be done in source() but in build()

See
https://github.com/madebr/conan-bullet/blob/f076bc2678b57e7757161eef7807ed3d9e8c33c7/conanfile.py#L54-L69

I can't see when are you really using the temp source folder for the build. How do you point to CMake where sour sources are? Did I overlook something?

Thanks 馃槃

This feature makes sense. Caching potentially big files from the internet, at the "conan application" level (not at the cache level) would make sense. CI servers retrieving hundreds of times the same sources.zip from any server, including their own Artifactory generic repos, would benefit from caching them locally. Docker builds could benefit by sharing that cache too.

The main problems I see with this feature are:

  • It is an optimization: As such, it gets lower priority than other functional features, bugs, etc.
  • It is not that easy to implement it in a general way. Once you are sharing that status, and there is concurrency, you need to handle that, and concurrency is never easy. With the current approach, it is minimized because downloads are done in different places. In your example there is a very high likelihood of collision and weird errors if building in parallel in CI with one conan cache per CI job. Also not easy to maintain, it will for sure require maintenance.

In summary, in my opinion it is an issue that even if it could make sense, it is very difficult that it will get enough priority to be implemented any time soon.

Well, if you don't have storage limitation, you could download the zip and store it with the recipe, and finally use exports_sources instead of download from remote.

Ok, so we will need a way to cache the sources of the packages so Conan does not download it every time.

@memsharded

It is an optimization

No, it is not. Lack of this feature makes impossible to use foreign recipes in offline builds.

It is not that easy to implement

The simplest implementation is adding a new command-line option that will force downloader in uploader_downloader.py not to delete existent files. I suggest it is a few lines of code.

@uilianries

and finally use exports_sources

I have to patch other's packages for this to work. This is what I want to avoid.

In my opinion the best solution is:

  1. make checksum required parameter in tools.download call
  2. allow tools.download to define alternative source to download (local path)
  3. add command line options, which will add another, predefined alternative to tools.download (or prefix for it)

Unfortunately 1 will break backward compatibility. But I think it's very necessary for conan-packages creators to define what exactly they expect to be downloaded. For now you don't force them and they don't care. And this is sad.

No, it is not. Lack of this feature makes impossible to use foreign recipes in offline builds.

So far this feature was intended as an optimization, not as a different origin of sources. From your other #5938 comment, it makes sense to enable a mechanism that allows to use this storage for that purpose too, because most of the infrastructure will be basically the same. Note that the above comment was written a few months ago.

I agree with the checksum, and I think we can force its definition to enable this feature (and it is a rule now in the new conan-center-index recipes).
I disagree about allowing the recipes to define another origin (well, they can, we cannot disallow it), but Conan will automatically use files from alternate origin if they are there an available. What conan can configure is that origin, its location, enable-disable, etc.

https://github.com/conan-io/conan/pull/6287 implements a Download Cache. It is being released in Conan 1.22, hopefully today.

Considering this feature as closed, for the other related ideas and suggestions in this thread, please consider opening new specific issues for them. Thanks!

Thanks for the work @memsharded!

The conan-package-tools use case is not really relevant anymore for me,
but it might be useful to have some communication between the host and guest.
For example using a REST service on the host which can provide a download cache.
This service also allows to copy the created packages to the host without upload to an external service.

Thanks for the work @memsharded!

No problem, happy to be able to make progress here. Traditionally we have not focused much on optimizations, but being "functional" first. As many Conan users are already managing huge projects, this starts to get higher priorities, so we recently added --parallel to uploads, parallel downloads is also ongoing, etc.

but it might be useful to have some communication between the host and guest.
For example using a REST service on the host which can provide a download cache.

If you are thinking about a REST service that can host packages, how is that different to a local conan_server or an Artifactory running locally on the host?

In any case, this sounds like it would be a wildly different thing, and architecturally complex, that surely we don't have the resources to implement.

If you are thinking about a REST service that can host packages, how is that different to a local conan_server or an Artifactory running locally on the host?

You're totally right about using a local Artifactory instance for avoiding to upload the created packages.

My question is about conan-package-tools + docker + conan.
The conan process, running inside the docker, always starts with an empty cache.
So, unless Artifactory provides this and correct me if I'm wrong, it would be useful to cache the download on the host and provide it to the docker instances using some interface.

So, unless Artifactory provides this and correct me if I'm wrong, it would be useful to cache the download on the host and provide it to the docker instances using some interface.

Sharing a Conan package cache among different potentially concurrent processes, no matter if they are CI jobs in the native machine or docker, it is not possible. The Conan package cache is not designed for that level of concurrency, and that is a very challenging thing to achieve, so I don't think it will be possible to implement that anytime soon. Adding an extra REST layer to access it adds more complexity, but doesn't remove the inherent concurrency complexity.

We hope that with the ongoing efforts:

  • The file download cache, that can actually be shared among docker instances, mapping it to a host folder (not tested, but I see no reason why this wouldn't work)
  • The install/download in parallel (ongoing efforts).

With these 2 things an empty cache would be populated really fast, without needing to have a shared package cache. Lets see how these efforts go.

Was this page helpful?
0 / 5 - 0 ratings