Type: Feature
Enter Task Name: Cache (https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/cache?view=azure-devops / https://github.com/microsoft/azure-pipelines-tasks/tree/master/Tasks/CacheV2)
I have a number of pipelines that run on the same hosted agent (around 50), for the same repository. All the pipelines use the same templates with lots of pipeline specific variables. I use the following cache task in a build template:
- task: Cache@2
condition: and(succeeded(), eq('${{ parameters.enableNugetCaching }}', true))
displayName: NuGet Cache
inputs:
key: 'nuget | "${{ parameters.solution }}" | "$(Agent.OS)" | ${{ parameters.nugetConfig }},**/packages.config,!**/bin/**,!**/obj/**'
restoreKeys: |
nuget | "${{ parameters.solution }}" | "$(Agent.OS)"
nuget | "${{ parameters.solution }}"
path: '${{ parameters.nugetPackagesDirectorySource }}'
Caching works perfectly when I run a single pipeline twice - it restores the expected cache, and creates a new cache if one of the matched files changes, as you would expect.
However, when I run another pipeline on the same repo / set of code, the cache generates exactly the same key, but does not "match" against the existing cache - it generates a new one. As this cache is around 600mb in size, the "pipeline cache" for the Azure DevOps organisation is now around 30Gb when it should be 600mb. I would expect extra caches to only generate when the source input files change, and the old caches to expire after 30 days (would be handy for this to be customised too, but that's not as important). It also means that all 50 pipelines take around 2 mins extra each, eating into the pipeline minutes.
I've attached a screenshot of a compare between the two cache job runs in two separate pipelines. Everything except the X-TFS-Session identifier are exactly the same. Ideally, the cache key should be shared between pipelines that run on the same repo.
In addition, if I was able to share this cache between pipelines I would be able to reduce the build by another 2 or 3 mins as I could cache the main solution binaries too - which don't change that often. It's an odd scenario, but one which suits this particular client's build requirements perfectly.
I've also posted here:
https://developercommunity.visualstudio.com/idea/1030422/share-cache-across-pipelines.html
I am considering taking a look into producing a PR for this, let me know if its something you're interested in.
Hi @Bidthedog - Thanks for offering to help! Unfortunately, the code changes required here are server-side. This is trickier than it might seem at first because this is an insidious attack vector.
Even if I don't have write access to a repo or it's CI build, I could go create a new pipeline that reads from that repo, but puts something "evil" in the cache. The CI build (that I don't have access to) would then read that "evil" cache entry and the build would carry forward my injected "evil" bits.
Knowing the above, one way to share artifacts between pipelines/projects is through Packages. In fact, there is a (non-official Azure DevOps but written by MSFT employees) task that acts similarly to Pipeline Caching but uses Universal Packages: https://github.com/Microsoft/azure-pipelines-artifact-caching-tasks
If you use it, you'll just have to be very careful about the permissions you have set.
OK, thank you for your response. It's not the end of the world, it just means that MUCH more cache space is taken up, and less-frequently-run builds do not take advantage of the cache; some of the pipelines - as you might imagine - are not executed regularly, so it would be handy if they used the cache when they do run. Others run multiple times per day.
Tbh, I'd much rather do this with a single pipeline, but my client's software architecture just doesn't make it feasible at present.
This pipeline/branch scoping makes cache task extremely inefficient. In my environment, CI builds unable to use cache produced by PR build (different pipelines). Literally, pipelines produce and store all this cached data for nothing.
This pipeline/branch scoping makes cache task extremely inefficient. In my environment, CI builds unable to use cache produced by PR build (different pipelines). Literally, pipelines produce and store all this cached data for nothing.
I have to agree. I've actually turned package caching off now I've moved to a self-hosted agent, because it's quicker to use the local server's cache than manage a cache per pipeline. One of my clients has 24 main pipelines (and counting) that should mostly use the same cache.
Another scenario (although, somehow similar to @gaikovoi's) in which this feature would come handy:
We create Python conda environment as part of our PR and CI builds. The operation takes around 5 minutes, but it rarely needs to be redone, as the environment stays unchanged for long time.
We can use current cache mechanism efficiently for CI builds, but not for PR build: since the cache is per build & per pipeline, and PR branches are short-lived, we end up building environment most of the times instead of using cached value. Ideally, we would like to use the environment created by CI builds for our PR build.
Also, we can't really use https://github.com/Microsoft/azure-pipelines-artifact-caching-tasks, as its documentation advices against using it for artifacts produced outside of the repo's directory (this is the case for conda environements)
Most helpful comment
This pipeline/branch scoping makes cache task extremely inefficient. In my environment, CI builds unable to use cache produced by PR build (different pipelines). Literally, pipelines produce and store all this cached data for nothing.