Azure-pipelines-tasks: Feature: Share cache across pipelines

Created on 14 May 2020 · 6Comments · Source: microsoft/azure-pipelines-tasks

Required Information

Type: Feature

Enter Task Name: Cache (https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/cache?view=azure-devops / https://github.com/microsoft/azure-pipelines-tasks/tree/master/Tasks/CacheV2)

Environment

Azure Pipelines (yml)
Can provide account / build / pipeline information privately
Agent - Hosted vs2017-win2016

Issue Description

I have a number of pipelines that run on the same hosted agent (around 50), for the same repository. All the pipelines use the same templates with lots of pipeline specific variables. I use the following cache task in a build template:

  - task: Cache@2
    condition: and(succeeded(), eq('${{ parameters.enableNugetCaching }}', true))
    displayName: NuGet Cache
    inputs:
      key: 'nuget | "${{ parameters.solution }}" | "$(Agent.OS)" | ${{ parameters.nugetConfig }},**/packages.config,!**/bin/**,!**/obj/**'
      restoreKeys: |
          nuget | "${{ parameters.solution }}" | "$(Agent.OS)"
          nuget | "${{ parameters.solution }}"
      path: '${{ parameters.nugetPackagesDirectorySource }}'

Caching works perfectly when I run a single pipeline twice - it restores the expected cache, and creates a new cache if one of the matched files changes, as you would expect.

However, when I run another pipeline on the same repo / set of code, the cache generates exactly the same key, but does not "match" against the existing cache - it generates a new one. As this cache is around 600mb in size, the "pipeline cache" for the Azure DevOps organisation is now around 30Gb when it should be 600mb. I would expect extra caches to only generate when the source input files change, and the old caches to expire after 30 days (would be handy for this to be customised too, but that's not as important). It also means that all 50 pipelines take around 2 mins extra each, eating into the pipeline minutes.

I've attached a screenshot of a compare between the two cache job runs in two separate pipelines. Everything except the X-TFS-Session identifier are exactly the same. Ideally, the cache key should be shared between pipelines that run on the same repo.

In addition, if I was able to share this cache between pipelines I would be able to reduce the build by another 2 or 3 mins as I could cache the main solution binaries too - which don't change that often. It's an odd scenario, but one which suits this particular client's build requirements perfectly.

I've also posted here:

https://developercommunity.visualstudio.com/idea/1030422/share-cache-across-pipelines.html

Task logs

Cache log comparisons

PipelineCaching enhancement

Source

Bidthedog

👍9

Most helpful comment

This pipeline/branch scoping makes cache task extremely inefficient. In my environment, CI builds unable to use cache produced by PR build (different pipelines). Literally, pipelines produce and store all this cached data for nothing.

gaikovoi on 6 Oct 2020

👍5

All 6 comments

I am considering taking a look into producing a PR for this, let me know if its something you're interested in.

Bidthedog on 14 May 2020

Hi @Bidthedog - Thanks for offering to help! Unfortunately, the code changes required here are server-side. This is trickier than it might seem at first because this is an insidious attack vector.

Even if I don't have write access to a repo or it's CI build, I could go create a new pipeline that reads from that repo, but puts something "evil" in the cache. The CI build (that I don't have access to) would then read that "evil" cache entry and the build would carry forward my injected "evil" bits.

Knowing the above, one way to share artifacts between pipelines/projects is through Packages. In fact, there is a (non-official Azure DevOps but written by MSFT employees) task that acts similarly to Pipeline Caching but uses Universal Packages: https://github.com/Microsoft/azure-pipelines-artifact-caching-tasks

If you use it, you'll just have to be very careful about the permissions you have set.

johnterickson on 14 May 2020

OK, thank you for your response. It's not the end of the world, it just means that MUCH more cache space is taken up, and less-frequently-run builds do not take advantage of the cache; some of the pipelines - as you might imagine - are not executed regularly, so it would be handy if they used the cache when they do run. Others run multiple times per day.

Tbh, I'd much rather do this with a single pipeline, but my client's software architecture just doesn't make it feasible at present.

Bidthedog on 14 May 2020

gaikovoi on 6 Oct 2020

👍5

This pipeline/branch scoping makes cache task extremely inefficient. In my environment, CI builds unable to use cache produced by PR build (different pipelines). Literally, pipelines produce and store all this cached data for nothing.

I have to agree. I've actually turned package caching off now I've moved to a self-hosted agent, because it's quicker to use the local server's cache than manage a cache per pipeline. One of my clients has 24 main pipelines (and counting) that should mostly use the same cache.

Bidthedog on 6 Oct 2020

Another scenario (although, somehow similar to @gaikovoi's) in which this feature would come handy:
We create Python conda environment as part of our PR and CI builds. The operation takes around 5 minutes, but it rarely needs to be redone, as the environment stays unchanged for long time.
We can use current cache mechanism efficiently for CI builds, but not for PR build: since the cache is per build & per pipeline, and PR branches are short-lived, we end up building environment most of the times instead of using cached value. Ideally, we would like to use the environment created by CI builds for our PR build.

Also, we can't really use https://github.com/Microsoft/azure-pipelines-artifact-caching-tasks, as its documentation advices against using it for artifacts produced outside of the repo's directory (this is the case for conda environements)

makukl on 6 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Warnings for task AzureFileCopy

jared-hexagon · 3Comments

Azure Pipeline public build throws TF400813: The user is not authorized to access this resource error when the Publish Build Artifacts task is executed

MarkIannucci · 3Comments

Command line task: Unicode encoding issue with windows

yaananth · 3Comments

Run Powershell on Target Machine

MichaelWhiteCodingForFun · 3Comments

If Select custom Xamarin SDK (Mono 5.16.0), MSBuild is not found in Xamarin.iOS task

fedemkr · 3Comments