Vector: Nightly builds for vector docker images at Github Actions

Created on 8 Feb 2020  路  9Comments  路  Source: timberio/vector

I noticed we have failing builds on our build images at docker hub: https://hub.docker.com/repository/docker/timberiodev/vector-builder-x86_64-unknown-linux-musl

I want to use pre-built images for building vector for the test harness invocations, but those seem to not be in order.

What if we implement a Github Action to rebuild and push them every night?
It should work better than building them at docker hub - it already builds an image on the fly when I'm packaging a .deb in an action - https://github.com/timberio/vector-test-harness-github-actions-test-repo/runs/432519284?check_suite_focus=true.
Github Actions also have this ability to run on schedule, so it's a self-contained solution.

task

Most helpful comment

Closing it as no longer relevant. We now have https://github.com/timberio/vector/blob/fe69741815f18ede2cb6c7d3c97949364d29aa37/.github/workflows/environment.yml#L1-L21 - which is effectively an equivalent of this issue for new env image.
Great job @Hoverbear!

All 9 comments

I noticed we have failing builds on our build images at docker hub: https://hub.docker.com/repository/docker/timberiodev/vector-builder-x86_64-unknown-linux-musl

That builds on DockerHub are not used, the nightly builds are produced using CircleCI (see the release-docker job in .circleci/config).

What if we implement a Github Action to rebuild and push them every night?

We should be already doing it with CircleCI. However, it seems like the images not published from CircleCi for some time because of the failing verify-nixos check, which seems to be failing because of some changes in the upstream nixpkgs repository.

I'm not against moving it to GitHub actions from CircleCI, but I think it should be done not in isolation, but as a part of migration of entire release workflow.

I want to use pre-built images for building vector for the test harness invocations, but those seem to not be in order.

If we are going to merge test harness into the main Vector repository, I'd prefer to always build Vector from source, thus avoiding the need to have DockerHub in the loop. For example, it would make it easy to run test harness locally using a modified and not yet published Vector version to assess performance improvements. I think it can be done using tests/Makefile and docker-compose.yml.

Sorry, I think there's a misconception. The title might've the title might've been confusing.

I'll try to restate the problem more clearly.

In the test harness we build vector from source for whatever PR the test harness is invoked. We don't just build the vector, but also package it as a .deb. To do so, the recommended way from the docs is used: PASS_FEATURES=default-musl ./scripts/docker-run.sh builder-x86_64-unknown-linux-musl make build-archive package-deb.

Now, the problem with that is in the Github Action environment, the docker cache can't be leveraged, and this causes a docker image to rebuild every time. This takes painstakingly long - more than the compilation and building of vector afterward - and I wanted to cut it by pulling the builder image from the docker hub instead of building it.

It's worth to say that I don't really care where the image would come from - if you don't like using Docker Hub, I'm ok with using Github Docker Registry. It'd just be good to have up-to-date builder images available, like timberiodev/vector-builder-x86_64-unknown-linux-musl. This will save a lot of time and computing power for our CI infrastructure.

Whether we build those images via Circle CI or Github Actions really doesn't matter, I just wanted to point out that Github Actions can do the job as they are (we don't really need custom runners for that), and they can schedule nightly executions automatically too.

Regarding merging the test harness into vector, I think it should be kept in its own separate repo. What we'll be merging into vector repo is just the Github Action that does the magic of invoking the test harness on a PR comment - a WIP version can be found here: https://github.com/timberio/vector-test-harness-github-actions-test-repo.

Ah, sorry, I did misunderstand you.

So it is basically https://github.com/timberio/vector/issues/946. We wanted to do it when building the builder image involved compiling LLVM and Clang, which took really long time (3-6 times longer than building Vector itself). Then https://github.com/timberio/vector/issues/1320 was done. At that point building the builder image on the fly became pretty fast (at least with high-speed internet connection), so that it took around a minute in CircleCI, which was much faster than building Vector itself (which was around 20 minutes in CircleCI), so it turned out that there were no benefits in pushing the image instead of just building it on the fly.

However, it seems to me that now one particular step, namely fetching of the LLVM archive in
https://github.com/timberio/vector/blob/d128ab8eb163f1aca92db3093f86fd1a447ab64c/scripts/ci-docker-images/builder-x86_64-unknown-linux-musl/Dockerfile#L46-L48 recently became painfully slow because it seems like GitHub stopped caching the archive (or started to throttle, the actual reason doesn't matter for our discussion).

So I agree than one approach to speed this up is your proposal (or #946), which should work fine for cases when the builder is used in CI (but not necessarily locally, see the remark below). Another, maybe quicker, solution could consist of just replacing https://github.com/llvm/llvm-project/archive/llvmorg-9.0.0.tar.gz in the Dockerfile to another archive URL which can just be downloaded faster (in the end of the day, pulling images from DockerHub or other container registry is fast not because the archives with layers which it serves are small, but because it has high output bandwidth). But I'm fine with exposing the builder images too.

It's worth to say that I don't really care where the image would come from - if you don't like using Docker Hub, I'm ok with using Github Docker Registry. It'd just be good to have up-to-date builder images available, like timberiodev/vector-builder-x86_64-unknown-linux-musl. This will save a lot of time and computing power for our CI infrastructure.

I meant that I didn't want to make it a requirement to push the image to any external registry in order to use test harness with modified builder. For example, one could enable LTO support in the builder Docker images and try to measure performance improvements caused by it using test harness. Ideally it should be possible to do so without having push access to any registry service.

I meant that I didn't want to make it a requirement to push the image to any external registry in order to use test harness with modified builder. For example, one could enable LTO support in the builder Docker images and try to measure performance improvements caused by it using test harness. Ideally it should be possible to do so without having push access to any registry service.

Oh yes, that's a good point, and I definitely would be interested in making the test harness invocation as flexible as possible. When invoking the test harness via Github Action, it'd use pre-build builder images by default, but with the option to build them from the source if the user explicitly demands that. It coverts the benefits of both approaches - we get a quick turnaround by default, and we can use a custom builder image build it a nightly image is not sufficient.

There are actually very few changes that I need from the build system to be able to support this, and that is to allow the build step here to be conditional:

https://github.com/timberio/vector/blob/d128ab8eb163f1aca92db3093f86fd1a447ab64c/scripts/docker-run.sh#L40-L42

Then it'll be possible to just pull the image prior to the execution. It may also be a good idea to make that script to be able to pull it. So like, three options: build, pull, do nothing (rely on whatever image present or not in the local docker image store already, at the user discretion).

So it is basically #946. We wanted to do it when building the builder image involved compiling LLVM and Clang, which took really long time (3-6 times longer than building Vector itself). Then #1320 was done. At that point building the builder image on the fly became pretty fast (at least with high-speed internet connection), so that it took around a minute in CircleCI, which was much faster than building Vector itself (which was around 20 minutes in CircleCI), so it turned out that there were no benefits in pushing the image instead of just building it on the fly.

However, it seems to me that now one particular step, namely fetching of the LLVM archive in
https://github.com/timberio/vector/blob/d128ab8eb163f1aca92db3093f86fd1a447ab64c/scripts/ci-docker-images/builder-x86_64-unknown-linux-musl/Dockerfile#L46-L48

recently became painfully slow because it seems like GitHub stopped caching the archive (or started to throttle, the actual reason doesn't matter for our discussion).
So I agree than one approach to speed this up is your proposal (or #946), which should work fine for cases when the builder is used in CI (but not necessarily locally, see the remark below). Another, maybe quicker, solution could consist of just replacing https://github.com/llvm/llvm-project/archive/llvmorg-9.0.0.tar.gz in the Dockerfile to another archive URL which can just be downloaded faster (in the end of the day, pulling images from DockerHub or other container registry is fast not because the archives with layers which it serves are small, but because it has high output bandwidth). But I'm fine with exposing the builder images too.

From the execution log I linked above - https://github.com/timberio/vector-test-harness-github-actions-test-repo/runs/432519284?check_suite_focus=true - it seems like the image build takes a lot of time. It start at Fri, 07 Feb 2020 19:08:16 GMT and ends at Fri, 07 Feb 2020 19:25:48 GMT, then the rest is done at Fri, 07 Feb 2020 19:46:08 GMT. That LLVM archive download that you are talking about only takes a minute - not that bad, and I can't confirm this is the reason the build takes so long. I encourage you to take a look at the Make deb step log.

For the reason above, I think exposing the pre-built images is the way to go. It'll easily shave off 20 minutes per execution, which is huge compared to the time it takes to actually run the tests in the test harness (even including the VM setup and such).

It looks like a good idea to maintain the toolchain (#946), but I'd be in favor of taking the same route to building vector as we do for published builds. If that switches to using toolchain - I'm all for it! It shouldn't be really a concern from the test harness how it's done. Even currently - it works as-is really, but if we can also make it quick - I think we have to do it.

2869 should do most of this

Seems it's not an issue since we've switched to nix - environment image is built nightly, and we don't need builder images anymore, do we?

So we'll still need builders for the releases (basically all the builder-* images, and all the verifier-* images). the checkers etc can probably be removed.

Just noting, we're getting rid of these images entirely.

Closing it as no longer relevant. We now have https://github.com/timberio/vector/blob/fe69741815f18ede2cb6c7d3c97949364d29aa37/.github/workflows/environment.yml#L1-L21 - which is effectively an equivalent of this issue for new env image.
Great job @Hoverbear!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

valyala picture valyala  路  3Comments

binarylogic picture binarylogic  路  4Comments

jhgg picture jhgg  路  4Comments

lewisthompson picture lewisthompson  路  3Comments

leebenson picture leebenson  路  3Comments