As you may know, I started building some bottles on my personal computer since a few months. This was still doable at the beginning, but I am hitting multiple problems:
circle.ci has served us well, but we are slowly reaching the limits of usability.
I want to collect some facts, benchmarks and ideas in this ticket, so that we can discuss the different options.
List of bottles that are built manually
llvm, llvm@5, rust, qt, pypy, pypy3
More may come, as we did not try to bottle everything. We have a bunch of CIRequirement's in the code preventing some bottles to be built, some of them due to out of memory problems.
On mac (upstream)
Back then homebrew bought mac minis. It probably made sense to own that kind of hardware at that moment, as mac hosting was almost non-existent (I may be wrong on why they went for buying the minis back then, just a hypothesis).
Currently the builds are running on https://www.macstadium.com. Homebrew got free VM time from them, with 8 Gb of Ram.
Just a sidenote here: they seem to be able to build everything with that amount of RAM. I can't build llvm with 16 Gb Memory and 2 Gb of swap on my Ubuntu 18.04, so this sounds weird, and we may need to make some measurements and investigation here.
circle.ci
We use the free circle.ci package with 4Gb of RAM. We have a timeout of 5 hours (@sjackman asked them to increase it for us, and this has been really helpful).
It seems to be possible to get 8 Gb or 16 Gb systems:
https://circleci.com/docs/2.0/configuration-reference/#resource_class
This option can only be asked when you already pay for their first non-free plan (150$ /month). We may ask them if we could get this for free or a discount though. But before doing that we need to make sure we need only 8 or 16 GB (or more?)
Academic hosting
Ask some universities to host us.
Other Docker/cloud hosting
Price range is probably similar to what circle has to offer. But we may get a better CI for free somewhere else. We need to have a look at different services to see what is out there.
Dedicated servers
Our revenues
We get 7$ per month through Patreon. This is used by @sjackman to pay for the brew.sh domain.
Todo
This is just a draft, which needs some improvements and input :)
I've thought about the possibility of setting up a Jenkins -> HPC CI system for brewsci. I believe NSF XSEDE might be a good target program to write a grant to request computational resources for CI infrastructure, but I haven't had a ton of time to really dig into it.
I am not in academia anymore so I probably can’t submit a request for that kind of grand.
Also, what are our success chances, and what would be the timeframe? I am worried that a grant may be only a short term solution (though I do not expect other solutions to be better).
Also, a few months ago I didn’t want to give any money to our Patreon, because I was considering that I was already spending free time for the project. But I may give some money, to speed things up. I think it is quite realistic to be able to reach 50 to 70$ per month if we find some more patreons, which would be enough for some CI.
Can we re-purpose our "old" computers?
I might be able to help out with this. I have root access to an older (vintage 2012) 32-core AMD Opteron 6274 machine. It's not great computationally, but it has 128GiB of memory and plenty of disk space. Also, it is hosted in a pretty robust data center (at the university where I work). It is externally reachable on the internet (napoleon.ekstranet.diku.dk) and currently is only very lightly loaded.
Downsides: it runs openSUSE, and realistically will not be maintained particularly actively (I'm the administrator, and I'm not very good at it). As long as the CI system is relatively isolated and does not depend on particularly new versions of system packages, this should not be a problem. I can guarantee that the machine will run for as long as I'm around (my contract is for at least another two and a half years), but beyond that things are hazy.
I think it is definitely an improvement over @iMichka running everything on his home box.
Determine how much RAM we need
Just a thought, will it help if the overcommit_memory is disabled? That is disallowing memory allocation if there is no enough memory instead of killed by the OOM-killer?
More info can be found at https://unix.stackexchange.com/a/136294
As long as the CI system is relatively isolated and does not depend on particularly new versions of system packages, this should not be a problem.
@athas I'd suggest using something like LXC to run the CI system in a self-contained container, rather than installing everything directly on the server itself :) That's also useful because you could grant other people SSH access to just the container without granting them access to the entire machine.
Just a thought, will it help if the overcommit_memory is disabled? That is disallowing memory allocation if there is no enough memory instead of killed by the OOM-killer?
It does not solve the problem that the bottle won't get built.
I might be able to help out with this. I have root access to an older (vintage 2012) 32-core AMD Opteron 6274 machine.
(my contract is for at least another two and a half years), but beyond that things are hazy.
Thanks @athas for the proposal. We may consider this but I would like to see if we can find a hosted solution somewhere, which would bring more stability in the long term.
Can we re-purpose our "old" computers?
Bandwidth and power are still and issue (depending on where these computers are sitting). Also my experience with old computers is that they are a often slow, and reach deprecation age even faster.
I am open to any suggestions. I think we should wait a little bit to take a decision. We are in the middle of the summer holidays, I'll be gone for 2 weeks sone. I think we should wait until September, maybe try to gater a little bit more of date (still wondering how the mac people only need 8 Gb of Ram to build everything). The goal would be to have a better CI for 2019 :)
First off, I just wanna say thanks for Linuxbrew. It made the transition from macOS to Linux quite a breeze, and I'm really happy with my new setup. :)
I stumbled upon this issue and thought that perhaps we (@UpCloudLtd) could help you guys out in some way, perhaps with some sponsored hosting? And I'm certain that @iler would be up for helping out setting up the CI together with you guys. Feel free to ping either of us if it sounds interesting to you!
What about using the recently-announced Azure Pipelines?
The upstream Homebrew/core (for macOS) is also discussing updating their CI, and have suggested Azure Pipelines. See https://github.com/Homebrew/homebrew-core/issues/31511
@jgabor Thanks for your offer to sponsor Linuxbrew. We're currently hosting our CI on CircleCI. I can see two places where cloud servers could help us, without entirely replacing our current CI arrangement.
We have a small web service that runs after a successful CI run. It downloads the bottles from CircleCI (stored on AWS S3) and uploads those bottles to Bintray, and it pushes a commit to GitHub. That service currently runs on AWS Lambda, whose resource limits are 500 MB of disk space (yes MB) and 5 minutes of run time. The 500 MB of disk space is occasionally problematic. 5 minutes is usually enough, though sometimes becomes a problem when brew update takes longer than usual to run.
CircleCI has a 4 GB memory limit and 5 hour run time limit. Some packages take more memory or more time. When that happens, I'd like to be able to trigger a run of brew test-bot that runs on other infrastructure that has more memory or time available.
I'm a product manager on Azure Pipelines. We have unlimited CI/CD minutes for a single pipeline. Each job can run up to 6 hours. You can use multiple jobs in a single pipeline and have 10 running at the same time.
I just found the doc that said 30 minutes. It's wrong. I'm fixing it now. It's 360 minutes for each job, or 6 hours.
@jeremyepling I tried running a Bash script on an Ubuntu 16.04 container that counted numbers upwards every second and it was stopped after 1 hour, so I couldn't replicate that. (However, this means it's actually not limited to 30 minutes at least, which was my initial assumption based on the documentation.)
Edit: It's possible to increase the time limit, see the comment below.
@Calinou as part of your job you can specify the timeoutInMinutes and set the value to 360.
jobs:
container: string
steps:
- script: echo Hello world
@jeremyepling Amazing, Jeremy! Thanks for joining the conversation!
Our CI workflow builds artifacts (precompiled binary packages known as bottles) from fork pull requests, which are uploaded to Bintray, and an automated-generated commit is pushed to GitHub. After running the public job that builds the artifacts, is it possible to run a private job to deploy the bottles to Bintray and GitHub that does not expose the credentials of Bintray and GitHub to the author of the PR (the owner of the fork)? This task has proven to be difficult with TravisCI and CircleCI.
My current workflow uses a webhook from a successful CircleCI build job to trigger a web service running on AWS Lambda that downloads the bottles from CircleCI (stored on S3) and uploads them to Bintray.
@jeremyepling How much RAM and disk space, and how many CPU cores are available to each build?
Hi @sjackman I am on the product team as well and I can take your deeper questions.
You get 2 CPU cores and 7.5 GB of ram for each job to run in. However, we can discuss larger machines. We are currently working with Homebrew on their requirements and would be happy to do the same with you.
On the workflow question, do you ever merge the PR into the main repo? In our system you could have a CI trigger on that and it will get the BinTray secret where the PR build would not.
@chrisrpatterson Hi, Chris! Linuxbrew has a shared code base with Homebrew, so we have very similar requirements as Homebrew. 7.5 GB of RAM should be sufficient for most jobs. CircelCI has 4 GB of RAM, and that's sufficient for >99% of jobs, but the 1% are problematic. I'm glad to hear that it's possible to discuss larger machines.
Execution time is an issue as well. Most jobs finish quickly (99% well under 6 hours), but there's occasional jobs that can take much longer.
On the workflow question, do you ever merge the PR into the main repo? In our system you could have a CI trigger on that and it will get the BinTray secret where the PR build would not.
Yes, we do merge the PR into the main repo. However it's during the CI run of the fork PR that the artifacts are built. Are these artifacts stored somehow that they can accessed by a CI job that runs when its pushed to master? CircleCI for example stores these artifacts on AWS S3 (owned by Circle's AWS account, not our AWS account). They are publicly accessible, and that's fine.
Ok I understand now, and I don't have an easy solution for that at the moment. The CI job that runs will not have an automatic reference to the artifacts stored by the PR job run that would make it easy but I expect something could be scripted.
Could you point me at an example of a bottle pr maybe I could think through some options.
Here's a PR: https://github.com/Linuxbrew/homebrew-core/pull/9395
(note the change to the code in this case is a dummy to trigger a CI build)
The CI run and the artifacts built by it are here:
https://circleci.com/gh/Linuxbrew/homebrew-core/20289#artifacts/containers/0
These artifacts are stored by CircleCI on AWS S3.
The successful CI run of the fork PR triggers a web service that downloads these artifacts from S3 and uploads them to Bintray here:
https://bintray.com/linuxbrew/bottles/cmark-gfm#files
Here's an example of a workflow in CircleCI composed of three steps: build-linux, build-macos, and collect-bottles: https://circleci.com/workflow-run/c0af0bba-517d-46d7-a514-4db4e2e32d4d
I'd like that final step collect-bottles to run with additional credentials to deploy the artifacts, and in a way that does not expose those credentials to the submitter of the PR.
Here's the PR: https://github.com/brewsci/homebrew-bio/pull/427
You can't define it in the YAML file but you can setup a build completion trigger https://docs.microsoft.com/en-us/azure/devops/pipelines/build/triggers?view=vsts&tabs=designer#build-completion-triggers. The second pipeline would be independent and would only trigger on successful completion of the other pipeline. In that pipeline, you could run your script that downloads the artifacts from the build that just completed and upload them where you want.
Sounds like that would likely work for us! Are the artifacts of the first pipeline stored indefinitely? What is the limit on their size?
By default all builds are deleted after 30 days but you can mark builds to be retained much longer if you need to. There is no hard limit on the size of an individual artifact.
30 days works for our purposes. Large build artifacts are < 1 GB. Most are a few megs.
@sjackman Sounds like something we could help out with, but sounds you might have found a solution already! Drop me a message at [email protected] if you're still interested. :)
Looks like Azure Pipelines will likely serve our purpose. Thanks for the offer all the same, Jonathan!
@chrisrpatterson Rather than a vmImage: ubuntu-16.04, can I specify that I'd like to run the build in a particular Docker image?
CircleCI has this feature, and it's very useful. For example
https://github.com/Linuxbrew/brew/blob/7a85c5a212e5ac243a762020a38c907b21d2f55e/.circleci/config.yml#L4-L5
I think you're looking for container jobs.
Thanks, Jeremy!
git fetch --tags --prune --progress --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/pull/42/merge:refs/remotes/pull/42/merge
The Get sources step is checking out the pull/42/merge commit. I would rather checkout the pull/42/head branch. Is it possible to configure that? It's easy enough to do that myself in the YAML file.
What does Azure Pipelines do when there's a merge conflict, and pull/42/merge doesn't exist?
@sjackman we will not trigger until there is a merge commit. Why would you prefer to checkout head instead of merge?
we will not trigger until there is a merge commit.
So if there's a merge conflict then there'd be no CI run? That's not a problem, just curious.
Why would you prefer to checkout head instead of merge?
Our automated CI script (named brew test-bot) inspects what changed in this PR, and tests the components that were changed. It doesn't expect to see a merge commit in the PR. It could be fixed.
The workaround is simple though, so not a pressing concern. See https://github.com/Linuxbrew/homebrew-extra/pull/44/files
Could you please point me to an example azure-pipelines.yml of how to publish artifacts (arbitrary files) to some publicly accessible location on Azure?
On CircleCI for example it's…
- store_artifacts:
path: /tmp/bottles
destination: bottles
Here is an example for uploading an artifact in one job and downloading in another https://docs.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=vsts&tabs=yaml#artifact-download
- task: PublishBuildArtifacts@1
inputs:
pathtoPublish: "/tmp/bottles"
artifactName: bottles
2018-09-19T20:21:37.0000804Z Uploading 3 files
2018-09-19T20:21:42.0060555Z Total file: 3 ---- Processed file: 0 (0%)
2018-09-19T20:21:53.5278590Z Fail to upload '/tmp/bottles/brew-test-bot.xml' due to 'TF400813: The user '…' is not authorized to access this resource.'.
2018-09-19T20:21:53.5343426Z Microsoft.VisualStudio.Services.Common.VssUnauthorizedException: TF400813: The user '…' is not authorized to access this resource.
@chrisrpatterson @jeremyepling I'm seeing this authorization error form a fork pull request. Any tips?
I need these files for temporarily only (say a week or so).
Yes publishing artifacts from fork builds is currently blocked as we did not have a way to given a sufficiently scoped token to the agent. This is something we are working on fixing but it is going to be a few weeks. The CI will be able to publish and you can conditionalize the publish. Here is an example https://github.com/onovotny/SignService/blob/master/azure-pipelines.server.yml#L47
Okay. Thanks for the information. Our workflow requires publishing artifacts from fork pull requests. Please do let me know when the feature is ready for testing by users.
CircleCI stores the artifacts in an AWS S3 account that is owned by CircleCI, so no configuration of credentials is required at all.
See https://circleci.com/docs/2.0/artifacts/
Could that be a possible path forward for Azure Pipelines?
Just a little info: (I did not read every comment here just scanned for circleci)
Circleci has a special "offer" for open-source projects as this one here is. I found it here -> https://circleci.com/pricing/
What if I am building open-source?
We offer a total of four free linux containers ($2400 annual value) for open-source projects. Simply keeping your project public will enable this for you!
We also offer the Seed plan free (at 1x concurrency) for OS X open source projects. Contact us at [email protected] for access. If you are building a bigger open-source project and need more resources, let us know how we can help you!
As I did not see this info here I wanted to share it with you. Hope it helps and is not duplicated here.
Since this discussion started I believe another strong contender is Azure Pipelines. Homebrew/brew has already moved our Linux CI to using that.
Thanks for the info, @MonkeyTonk. In fact, CircleCI is the CI platform that Linuxbrew is currently using. There's a few limitations with CircleCI, but a significant one is that the free containers have 4 GB of RAM. Compiling some C++ programs exceeds that memory requirement.
We hope to migrate to Azure Pipelines once it supports uploading artifacts from fork pull requests.
Hi @sjackman. Thanks for considering using Azure Pipelines. It supports uploading artifacts from fork builds now. Please let me know if you have any questions or if we can help get things going. I'm a program manager on the team and my email is dastahel at microsoft dot com.
Excellent! Thanks for the good news, David. New year's 🎁 for me. =)
cc @MikeMcQuaid
@DavidStaheli Great news for Homebrew too! Is there an RSS feed/mailing list/blog where we can keep up to date with these sorts of changes?
Hi Mike! Every 3 weeks (our sprint length), release notes are posted here. You can click Subscribe to updates for the RSS feed. Deployments were delayed around the holidays, so the latest release notes won't be out for 1-2 weeks. Here's the draft for this feature:
Until now, pull request validation builds for repository forks did not have permission to upload and download build artifacts nor change the build number. It would have been insecure to make the agent’s broader-scoped permissions available during a fork build triggered by an unknown user, so permissions were constrained. Now, agent permissions are finer-scoped so that your pipeline can perform these operations if you would like. Below is an example of YAML that archives build output into a tar.gz file in the artifact staging directory and publishes them to Azure Pipelines to be associated with the build. See also the Archive Files and Publish Build Artifacts task documentation.
- task: ArchiveFiles@2
inputs:
archiveType: 'tar'
tarCompression: 'gz'
includeRootFolder: false
rootFolderOrFile: '$(build.sourcesDirectory)/target'
archiveFile: '$(build.artifactStagingDirectory)/$(build.buildId).tar.gz'
- task: PublishBuildArtifacts@1
inputs:
pathtoPublish: '$(build.artifactStagingDirectory)'
@DavidStaheli Great, thanks!
I'm trying to set this up for linuxbrew-core. I am currently stuck with the setup of the docker container.
Nothing special here, just using the proposed syntax:
container:
image: linuxbrew/brew:latest
Azure-pipelines tries to run a step where I do not have any control on it (Initialize container).
Here is the error I am getting:
2019-01-09T21:19:59.4932907Z Status: Downloaded newer image for linuxbrew/brew:latest
2019-01-09T21:19:59.5020775Z ##[command]/usr/bin/docker network create --label 35f4b8 vsts_network_e06b31145d5242deb513f7a6077218bb
2019-01-09T21:19:59.6459802Z 24f99a683c4b952ece705e2d33d2a2fcbee564a9d90f08acbd5d20cd7a52c18c
2019-01-09T21:19:59.6529463Z ##[command]/usr/bin/docker inspect --format="{{index .Config.Labels \"com.azure.dev.pipelines.agent.handler.node.path\"}}" linuxbrew/brew:latest
2019-01-09T21:19:59.7052663Z ##[command]/usr/bin/docker create --name 87dd50c3a60b4535902d9776343dee79_linuxbrewbrewlatest_fc7bd3 --network=vsts_network_e06b31145d5242deb513f7a6077218bb --label 35f4b8 -v /var/run/docker.sock:/var/run/docker.sock -v "/home/vsts/work/1":"/__w/1" -v "/home/vsts/work/_temp":"/__w/_temp" -v "/opt/hostedtoolcache":"/__t" -v "/home/vsts/work/_tasks":"/__w/_tasks" -v "/home/vsts/agents/2.144.0/externals":"/__a/externals":ro -v "/home/vsts/work/.taskkey":"/__w/.taskkey" linuxbrew/brew:latest "/__a/externals/node/bin/node" -e "setInterval(function(){}, 24 * 60 * 60 * 1000);"
2019-01-09T21:20:16.4389391Z 61b1583eb3e758d179333a2ee20ce75ee899dd7c7c964a190ab8370a89acb642
2019-01-09T21:20:16.4459602Z ##[command]/usr/bin/docker start 61b1583eb3e758d179333a2ee20ce75ee899dd7c7c964a190ab8370a89acb642
2019-01-09T21:20:19.0614538Z 61b1583eb3e758d179333a2ee20ce75ee899dd7c7c964a190ab8370a89acb642
2019-01-09T21:20:19.0677518Z ##[command]/usr/bin/docker exec 61b1583eb3e758d179333a2ee20ce75ee899dd7c7c964a190ab8370a89acb642 sh -c "command -v bash"
2019-01-09T21:20:19.1881267Z /bin/bash
2019-01-09T21:20:19.2375857Z ##[command]whoami
2019-01-09T21:20:19.2812714Z vsts
2019-01-09T21:20:19.2816258Z ##[command]id -u vsts
2019-01-09T21:20:19.2867471Z 1001
2019-01-09T21:20:19.2870099Z Try create an user with UID '1001' inside the container.
2019-01-09T21:20:19.2900740Z ##[command]/usr/bin/docker exec 61b1583eb3e758d179333a2ee20ce75ee899dd7c7c964a190ab8370a89acb642 bash -c "grep 1001 /etc/passwd | cut -f1 -d:"
2019-01-09T21:20:19.4581391Z ##[command]/usr/bin/docker exec 61b1583eb3e758d179333a2ee20ce75ee899dd7c7c964a190ab8370a89acb642 useradd -m -u 1001 vsts_azpcontainer
2019-01-09T21:20:19.6054609Z useradd: Permission denied.
2019-01-09T21:20:19.6055730Z useradd: cannot lock /etc/passwd; try again later.
2019-01-09T21:20:19.6775738Z ##[error]Docker exec fail with exit code 1
I then tried to workaround this problem by using:
container:
image: linuxbrew/brew:latest
options: -u 0
This logs in as root, and the "Initialize container" passes. I was then hoping to do add a sudo su linuxbrew step inside the first bash: block, to come back to the original user. The job then gets stuck at that point and never returns (it runs forever until I cancel it).
I can simulate these steps locally with success:
docker pull linuxbrew/brew:latest
docker run -it -u 0 --name=brew linuxbrew/brew
sudo su linuxbrew
Actually it would be good if azure pipelines would run sudo useradd -m -u 1001 vsts_azpcontainer instead of useradd -m -u 1001 vsts_azpcontainer, because that does not fail when logged in with our linuxbrew user. Not sure how I can influence these commands though (or disable them).
Hi @iMichka. Thanks for the great detail. I'll find out what I can and get back to you ASAP.
Most Docker containers have a default user of root. Our Linuxbrew/brew container is a bit unusual in that regard, that the default user is linuxbrew UID=1000. We do this because Homebrew refuses to run as root. We could modify the image to make the default user root and add a HOMEBREW_… environment variable to allow running as root (@MikeMcQuaid your opinion on this fix?).
@DavidStaheli It makes sense to me for Azure Pipelines to run sudo useradd -m -u 1001 vsts_azpcontainer if the current user is non-root and /usr/bin/sudo exists, and otherwise run useradd -m -u 1001 vsts_azpcontainer. (edited for multiple typos)
@iMichka sudo su linuxbrew would start a new interactive shell that waits for input from a non-existant terminal. You can instead run…
sudo -u linuxbrew brew test-bot
I found one workaround:
pool:
vmImage: 'Ubuntu-16.04'
steps:
- script: printenv
- bash: |
set -ex
sudo docker run linuxbrew/brew:latest /bin/bash -c "set -ex; ......................;"
But this defeats the usage of the container: step. And it will force me to put more than 20 commands in that one-line bash call. The idea comes from https://github.com/MicrosoftDocs/vsts-docs/issues/2939.
I think it makes sense to me for Azure Pipelines would run sudo useradd -m -u 1001 vsts_azpcontainer if the current users is non-root and /usr/bin/sudo exists, and otherwise run useradd -m -u 1001 vsts_azpcontainer.
I have seen some bug reports here and there where people complain about sudo being called by azure-pipelines whereas some containers do not have sudo installed. I think that was the case in previous versions of azure-pipelines, so they probably removed these calls at one point. But I agree that the solution proposed by @sjackman would work for our use case.
@iMichka, Shaun suggested to use sudo only if it is installed and the current user is not root.
Shall we re-open this issue or create a new one?
We do this because Homebrew refuses to run as root. We could modify the image to make the default user
rootand add aHOMEBREW_…environment variable to allow running as root (@MikeMcQuaid your opinion on this fix?).
I'd rather we didn't do this just for supporting a given CI provider. If we do end up going down this line of thought it definitely shouldn't be an environment variable that enables this behaviour but instead making it Linux-only, checking for the existence of a file that would require root and SIP disabled to create on macOS and ideally a check to make sure it's running inside a Docker container specifically.
I've started a discussion for my team over in https://github.com/Microsoft/azure-pipelines-agent/issues/2043. I'm not sure which way we'll go here -- the whole su/sudo/"what user am I?" between host and container is fraught with peril.
Drone (@droneio) has a free "Drone Cloud" service running at Packet (@packethost), my place of work. It might be a good place to build.
Thanks, Edward. Right now we're focusing our attention on Azure Pipelines. I appreciate the info all the same.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Most helpful comment
@Calinou as part of your job you can specify the timeoutInMinutes and set the value to 360.
jobs:
timeoutInMinutes: number
cancelTimeoutInMinutes: number
strategy:
parallel: number
maxParallel: number
matrix: { string: { string: string } }
pool:
name: string
demands: string | [ string ]
container: string
steps:
- script: echo Hello world