A kaniko user asked via email ,
any suggestions on maintaining the cache in GCR? Storage is cheap, sure, but we make thousands of images a month for our CI/CD pipeline and that will add up quickly. Are there plans to include some sort of cleanup routines in the Kaniko image to manage the cache?
I _think_ the following is true:
Every tagged image references the layers/images in the cache in their manifest (ie. the images we want to keep around). The repository prevents you from deleting those layers as they are "in use".
Ignoring HOW tagged images might be cleaned up (next step), I think it is safe to assume that we could enumerate all the images in the cache that were created more than some time period in the past and delete them, ignoring error responses that indicated it could not be deleted because it's "in use". This would remove any dangling cache images (ie. incomplete builds) as well as cache images that are no longer referenced by a tagged image.
I meant to add: if Kaniko could do this piece, then we would not have to copy the cache configuration to other tools or keep the two in-sync.
This was also brought up in https://github.com/GoogleContainerTools/skaffold/issues/3487
I _think_ the following is true:
Every tagged image references the layers/images in the cache in their manifest (ie. the images we want to keep around). The repository prevents you from deleting those layers as they are "in use".Ignoring HOW tagged images might be cleaned up (next step), I think it is safe to assume that we could enumerate all the images in the cache that were created more than some time period in the past and delete them, ignoring error responses that indicated it could not be deleted because it's "in use". This would remove any dangling cache images (ie. incomplete builds) as well as cache images that are no longer referenced by a tagged image.
@raijinsetsu do you have a script to do that or a program ?
I have something... but there's a bug in the tagged-image deletion logic. So, just ignore that piece. I cannot send the entire script due to proprietary code, but here is the part of it that handles cache maintenance.
# transformListGcrImageTags
# Reads "<digest> <tag/timestamp> [timestamp]" from stdin
# and outputs "<digest> <timestamp> [tag]"
function transformListGcrImageTags() {
local digest tag timestamp
while read digest tag timestamp ; do
if [[ "$digest" == "DIGEST" ]]; then
# ignore the header line
continue
fi
if [[ -z "$timestamp" ]]; then
# tag is actually the timestamp
echo $digest ${tag}Z
else
echo $digest ${timestamp}Z $tag
fi
done
}
function listGcrImageTags() {
if [[ -n "${2-}" ]]; then
gcloud container images list-tags --filter "$2" "$1" | transformListGcrImageTags
else
gcloud container images list-tags "$1" | transformListGcrImageTags
fi
}
# filterBranchImages <image> <git origin>
# filters the images: if it has a remote branch, remove the tag
# receives input from STDIN
# example:
# gcloud container images list-tags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin}
function filterBranchImages() {
local digest tag timestamp
# read one line into fields
while read digest timestamp tag ; do
if [[ -n "$tag" ]]; then
# tagged image
# strip off the trailing commit hash from the tag to get the branch
local b=${tag%-*}
if ! (remoteBranchExists "branch" "$b" "$2" || remoteBranchExists "branch" "feat/$b" "$2" || remoteBranchExists "branch" "fix/$b" "$2" || remoteBranchExists "branch" "chore/$b" "$2" ) ; then
# the remote branch does not exist
echo "${1}:${tag}"
fi
else
# untagged image - cannot determine remote branch
echo "${1}@sha256:${digest}"
fi
done
}
function filterCacheImages() {
local digest tag timestamp
while read digest timestamp tag ; do
if [[ -n "$tag" ]]; then
echo "${1}:${tag}"
else
echo "${1}@sha256:${digest}"
fi
done
}
function deleteTags() {
# use xargs to batch up multiple deletions
# also suppress stdout
xargs -r gcloud container images delete > /dev/null
}
function cleanupGcrImages() {
local expireTS=$((now - 1209600))
local expire=$( date -Iseconds -d \@$expireTS )
echo "Cache expiration: $expire"
if [[ $TEST -eq 0 ]]; then
listGcrImageTags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin} | tee >(cat >&2) | deleteTags
listGcrImageTags ${gcr_root}/rest-server/dev/cache "timestamp.datetime < $expire" | filterCacheImages ${gcr_root}/rest-server/dev/cache ${remote_origin} | tee >(cat >&2) | deleteTags
else
listGcrImageTags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin}
listGcrImageTags ${gcr_root}/rest-server/dev/cache "timestamp.datetime < $expire" | filterCacheImages ${gcr_root}/rest-server/dev/cache ${remote_origin}
fi
}
It's supposed to conditionally delete the tagged image based on the presence of the corresponding branch in Git but we ran into an edge case where that is not true.
I think it would be great if the Cache cleanup piece were part of Kaniko but I understand if this is TOO repository specific.
Most helpful comment
I have something... but there's a bug in the tagged-image deletion logic. So, just ignore that piece. I cannot send the entire script due to proprietary code, but here is the part of it that handles cache maintenance.
It's supposed to conditionally delete the tagged image based on the presence of the corresponding branch in Git but we ran into an edge case where that is not true.
I think it would be great if the Cache cleanup piece were part of Kaniko but I understand if this is TOO repository specific.