Conan: Implement something like git gc to clean out older versions of dependencies.

Created on 6 Jul 2019  路  13Comments  路  Source: conan-io/conan

On my windows development machine I just realized I had over 30GB of dependencies stored in conan/data directory. This is something our developers has commented on as well. One reason our data dirs baloon that much is because we have a revision system that changes the version of the packages per change - but the same will be true now that official binary revisions are implemented in conan - lot of stuff we be stored and will build up over time.

I suggest we start storing the last used date in the local package directory - so when a package is used (i.e. installed from a conanfile) a timestamp is written to the package dir somewhere. We can then scan through all versions of each package and remove the ones that haven't been used in a while.

Another option is to just remove everything but the latest version. But that might be hard to know what the latest version is if some packages are not using semver versioning for example.

This should mostly like be added as a option to the remove command and then maybe add it as a option to do this automatically on some kind of schedule.

triaging whiteboard

Most helpful comment

We have a fully remote team on many different continents and different bandwidth access. Our artifactory server is in west coast US. Some times we can see download times that takes a very long time just because poor routing and bad internet weather. Plus the more offline/online nature of some peoples setup.

All 13 comments

The problem is that Conan doesn't track consumers, which versions do they use and such. In theory, there even could be situations where two consumers use different versions of A and B, one is using the latest A and another is using the latest B.
My suggestion is to use conan remove "*" --packages to mitigate the binary explosion. You'll keep recipes and the binary download shouldn't be slow if you are behind NAT.

Well that's why I suggested that instead of knowing what the consumer wants we can just make sure to track WHEN they are used - this way it's just packages that are recently used that are kept around. I think this is a pretty clean solution compared to tracking the consumers and what exact package they use.

Yes I can remove all packages. But for our project that's actually quite a big download and would definitely be annoying to remove them on a schedule without any consideration given to if they where recently used or not.

I think the gc command also should only remove the binary packages and keep the conanfiles around.

Not sure what NAT has to do with anything?

I guess projects could have a timestamp on each generator usage in their metadata file and conan remove could be parametrised on it.

Environments (both binary server and clients) behind NAT are usually not that slow to worry much about downloads. 馃檪

Hi! Why it is not valid to just wipe the entire cache and retrieve from a reference server? I'm sure there are good reasons so I want to know them.

We have a fully remote team on many different continents and different bandwidth access. Our artifactory server is in west coast US. Some times we can see download times that takes a very long time just because poor routing and bad internet weather. Plus the more offline/online nature of some peoples setup.

And we have lots of large packages (mainly toolchains)

For reference, this issue is almost the same: https://github.com/conan-io/conan/issues/3587
There are some still valid concerns about this feature.

I think the concerns are a bit overblown to be honest. Conan could store the last access time in a file (similarly to lockfiles) and decide based on that. If a larger than necessary set of packages are removed from the cache, that's not a really big deal, is it? It's still better than having to blow away and re-download everything again.

Have you considered using the access time from the operating system?

Conanfiles not accessed in more than 30 days:

find ~/.conan/data -type f -name "conanfile.py" -atime +30 -exec ls {} \;

From there we could get the recipe directory and cleaning them. We might be able to do the same for packages searching for conaninfo.txt files.

It should be some similar trick for Windows I suppose.

The access time in windows is really unreliable and a lot of people disable writing back access time on linux, its probably fine if you write a script for yourself to clean the conan cache up, but conan should just write the access time into a file everytime you conan install or something like that

Maybe it can be useful to others.
I created a script to cleanup old Conan packages, assuming the Conan storage folder is /opt/conan:

#!/bin/bash

#
# Script that cleans up the local cache of Conan packages
# The logic is to cleanup packages to make sure the Conan cache does not take more than
# a fixed amount of disk space
#

# Constants

MAX_AGE_THRESHOLD_SEC=1209600   # 2weeks
MIN_AGE_THRESHOLD_SEC=432000    # 5days
MAX_CONAN_CACHE_SIZE_MB=5000
MAX_NUM_ITERATIONS=5
NUMBER_REGEXP='^[\-]?[[0-9]+$'
CONAN_STORAGE_DIR="/opt/conan"


function assert_is_number()
{
    local NUMBER_STRING="$1"
    if ! [[ $NUMBER_STRING =~ $NUMBER_REGEXP ]]; then
        echo "It was expected to have a number instead [$NUMBER_STRING] was found"
        exit 2
    fi
}

function compute_conan_cache_size()
{
    CURRENT_SIZE_MB="$(du --summarize --block-size=1M $CONAN_STORAGE_DIR | cut -f1)"
    assert_is_number "$CURRENT_SIZE_MB"
}

function remove_conan_packages_older_than()
{
    local AGE_THRESHOLD_SEC="$1"

    for conan_pkg_dir in $CONAN_STORAGE_DIR/*/* ; do
        if [ -d "$conan_pkg_dir" ]; then
            TS_CONAN_PKG=`stat --format=%Y $conan_pkg_dir`
            if (( (TS_NOW - TS_CONAN_PKG) > AGE_THRESHOLD_SEC )); then
                echo "  Removing Conan package $conan_pkg_dir which is more than ${AGE_THRESHOLD_SEC}sec old."
                rm -rf $conan_pkg_dir
                ((( NREMOVED++ )))
            else
                echo "  Skipping removal of Conan package $conan_pkg_dir which was last modified `stat --format=%y $conan_pkg_dir`. It was modified less than ${AGE_THRESHOLD_SEC}sec ago."
            fi
        fi
    done
}


TS_NOW=$(date +%s)
NREMOVED=0
NUM_IT=0
CURR_AGE_THRESHOLD_SEC=$MAX_AGE_THRESHOLD_SEC
AGE_DECREASE_STEP_SEC=172800  # 2days

echo "------------------------------------------------------------------------------------"
echo "Cleaning local Conan cache on this machine"
echo "------------------------------------------------------------------------------------"

compute_conan_cache_size
echo "Before cleanup the $CONAN_STORAGE_DIR takes ${CURRENT_SIZE_MB}MB"
while (( CURRENT_SIZE_MB >= MAX_CONAN_CACHE_SIZE_MB )); do
    echo "Current size=${CURRENT_SIZE_MB}MB, target size=${MAX_CONAN_CACHE_SIZE_MB}MB. Starting removal of packages older than ${CURR_AGE_THRESHOLD_SEC}sec. Iteration ${NUM_IT}"
    remove_conan_packages_older_than "$CURR_AGE_THRESHOLD_SEC"
    compute_conan_cache_size  # for next iteration

    CURR_AGE_THRESHOLD_SEC=$(( CURR_AGE_THRESHOLD_SEC - AGE_DECREASE_STEP_SEC ))
    if (( CURR_AGE_THRESHOLD_SEC < MIN_AGE_THRESHOLD_SEC )); then
        echo "Bailing out after reaching an age threshold equal to ${CURR_AGE_THRESHOLD_SEC}sec"
        exit 1
    fi

    (( NUM_IT++ ))
    if (( NUM_IT > MAX_NUM_ITERATIONS )); then
        echo "Bailing out after reaching ${NUM_IT} iterations and still did not meet stop criteria"
        exit 1
    fi
done

echo "Removed a total of $NREMOVED packages. After cleanup the $CONAN_STORAGE_DIR takes ${CURRENT_SIZE_MB}MB"


Thanks @f18m for sharing!

FYI, this contribution and the effort put in this have pushed me to start doing something: https://github.com/conan-io/conan/pull/7980

It is still a proof of concept, but seems doable.

Was this page helpful?
0 / 5 - 0 ratings