Similar as #549, I am trying to use Cache Instruction to preserve nix store between builds for Podman, e.g.
With following code, expect that once cache created successfully with the first build, the next trigger should able to fetch the cache and extract it correctly (see https://github.com/containers/podman/pull/7076):
# Build the static binary
static_build_task:
depends_on:
- "gating"
gce_instance:
image_name: "${FEDORA_CACHE_IMAGE_NAME}"
cpu: 8
memory: 12
disk: 200
init_script: |
set -ex
setenforce 0
growpart /dev/sda 1 || true
resize2fs /dev/sda1 || true
yum -y install podman
nix_cache:
folder: '.cache'
reupload_on_changes: true
fingerprint_script: |
echo "nix-v1-$(sha1sum nix/nixpkgs.json | head -c 40)"
populate_script: |
mkdir -p .cache
build_script: |
set -ex
mkdir -p /nix
if [[ ! -z $(ls -A .cache) ]]; then rsync -aq --delete .cache/ /nix; else podman run --rm --privileged -ti -v /:/mnt nixos/nix cp -rfT /nix /mnt/nix; fi
podman run --rm --privileged -ti -v /nix:/nix -v ${PWD}:${PWD} -w ${PWD} nixos/nix nix --print-build-logs --option cores 8 --option max-jobs 8 build --file nix/
mkdir -p .cache
rsync -aq --delete /nix/ .cache
chown -Rf $(whoami) .cache
binaries_artifacts:
path: "result/bin/podman"
For 2nd build (https://cirrus-ci.com/task/6256212465418240, triggered by git commit --amend -s && git push --force), the "Populate nix cache" phase end up with "Cache miss" as below:
echo "nix-v1-$(sha1sum nix/nixpkgs.json | head -c 40)"
nix-v1-d353b055fbc31916f4ce6cc1b30632eb5041968d
Cache miss for nix-e97872d7e9934fd29c03ca790fdbc84d4552249a06951a4ba1fedebc57511f72! Populating...
mkdir -p .cache
BTW, for the "Upload 'nix' cache" phase after successful build without previous cached data, it end up with "Some other task has already uploaded cache entry" as below:
SHA for /var/tmp/go/src/github.com/containers/libpod/.cache is '7583373c4d6fa8c8b0231cf53ada0c7d34b22677cfaf3749f3ef732eb56b7630'
nix cache size is 894Mb.
Some other task has already uploaded cache entry nix-e97872d7e9934fd29c03ca790fdbc84d4552249a06951a4ba1fedebc57511f72! Skipping upload...
@hswong3i it seems the agent was hitting 5 minutes limit for downloading the cache and unfortunately silently failed with the weird output you saw. I've created https://github.com/cirruslabs/cirrus-ci-agent/pull/24 to fix it.
Another question is why it took so long to download your cache of almost 900Mb. Is it reproducible on your end or it was a one time thing? Maybe there was some network issue.
Another question is why it took so long to download your cache of almost 900Mb. Is it reproducible on your end or it was a one time thing? Maybe there was some network issue.
The 900mb is reproducible, nix build always result as such large in size for its stored result 😅
But downloading 900mb cached result shouldn’t take such long time (i.e. >5mins), because during build phase it may also download around 1gb from network but usually just take around 3~5mins for it with Cirrus CI 🤔
By reproducible I meant reproducibly slow downloads. Downloading 900Mb archive shouldn't take that long within the sane datacenter. I wonder if you could try to re-run the task a few times to see distribution of times for the downloads.
By reproducible I meant reproducibly slow downloads. Downloading 900Mb archive shouldn't take that long within the sane datacenter. I wonder if you could try to re-run the task a few times to see distribution of times for the downloads.
I had tried for more than 10 times by ‘git commit --amend -s && git push --force’, confirm that such error always happened :-(
I am wondering the cache download is fast enough but timeout for the extract action, because nix store should contain number of files in small size 🤔
@fkorotkov it is now rerunning with https://cirrus-ci.com/task/4585897469411328, let's have a look again ;-)
Are these builds relevant to this issue?
git commit --amend: https://cirrus-ci.com/task/6390248798158848I see:
Cache miss for bundle-7333244c1c323a7abd0cbfcfcedff371ee7c211eb3a140ec208874a73d18308c! No script to populate with.
Then installation and then:
SHA for /usr/local/bundle is '1d3c5ff318008131770481c4e51034833d9e69340260b80d3f97831c98a516a1'
bundle cache size is 4Mb.
Some other task has already uploaded cache entry bundle-7333244c1c323a7abd0cbfcfcedff371ee7c211eb3a140ec208874a73d18308c! Skipping upload...
@AlexWayfer doesn't look like the timeout issue but might be related to some other HTTP issue
which is covered by https://github.com/cirruslabs/cirrus-ci-agent/pull/24
@AlexWayfer in your case seems this task uploaded the cache entry.
"Cache miss" still happening after https://github.com/cirruslabs/cirrus-ci-agent/pull/24
For me cache looks fixed: https://cirrus-ci.com/task/5153229782646784?command=bundle#L47
Thank you.
@hswong3i the agent just got deployed this morning around 11am EST which included the changes. Let's see if newer tasks will have the issue.
@fkorotkov at least this build still failed as previous (maybe before agent changes deployed):
Working on another git push --force, let's double check the fixes ;-)
@fkorotkov well both test still failed with "cache miss"...:
Oh, you are using a GCE instance. They are not yet migrated to the newly release agent. Right now we are in the process of not only migrating to the version of the agent published on GitHub but also to a new GRPC backed server so we are taking things slow. Community Linux containers were first today and everything went smoothly. I'm planning to make the full switch tomorrow morning EST time. I'll ping you here once it will be available. Sorry for the confusion!
@fkorotkov thank you very much for clarify, let's double check again once things go ready ;-)
@hswong3i just switched everything remaining to the public version of the agent. The new backend also has some things that theoretically should increase download speed for large caches.
@hswong3i just switched everything remaining to the public version of the agent. The new backend also has some things that theoretically should increase download speed for large caches.
Retesting with:
Looks not too good: the download size just around 80~90mb (which should be around 1gb), therefore the extract failed with EOF...
Crap, will investigate it. One last resort might also be to change zone to us-central1-a to be as close as possible to Cirrus.
There is an ongoing network incident all day in us-central1 which seems has been affecting some of Cirrus CI tasks including chopped caches.
Wow finally cache hit in one shot (https://cirrus-ci.com/task/6631072614055936):
echo "nix-v1-$(sha1sum nix/nixpkgs.json | head -c 40)"
nix-v1-d353b055fbc31916f4ce6cc1b30632eb5041968d
Downloaded 890Mb.
Cache hit for nix-e97872d7e9934fd29c03ca790fdbc84d4552249a06951a4ba1fedebc57511f72!
Another build failed extract with EOF at 1st try, but successful for 2nd try (https://cirrus-ci.com/task/5522320095707136):
echo "nix-v1-$(sha1sum nix/nixpkgs.json | head -c 40)"
nix-v1-d353b055fbc31916f4ce6cc1b30632eb5041968d
Downloaded 176Mb.
Cache hit for nix-e97872d7e9934fd29c03ca790fdbc84d4552249a06951a4ba1fedebc57511f72!
Failed to unarchive nix cache because of /var/tmp/go/src/github.com/containers/podman/.cache/store/22413xlx43dm3vmzzd0l4zfcklgh1zxq-go-1.14.6/share/go/pkg/tool/linux_amd64/link: writing file after 5414309 bytes (expected 6715196): unexpected EOF! Retrying...
Downloaded 894Mb.
Both builds build phase now speed up from ~40mins to ~4mins (Wooo!!), for just rebuild the target source code but reuse previous cached dependency result.
Thank you very very much ~~~
I think we can close the issue. @hswong3i if you'll see issues with your huge cache please let me know.