Conda-forge.github.io: Builds that fail on CIs

Created on 19 Oct 2019  路  21Comments  路  Source: conda-forge/conda-forge.github.io

Certain expensive packages seem to fail even on Azure Machines (6 hour builds). This top post is a summary of the current status as it stands on the date which it has been updated:

Possible reasons

  • Out of memory
  • Not enough build time

Builds that fail

Most helpful comment

I will look into this. We have a well-defined path for adding additional hosted capacity for an open source project. There's no such path for adding a self-hosted agent. Which is not to say it's not possible, just that there's no button I can press to make it happen immediately :grin:

All 21 comments

Not too sure what the solution is. I have a large machine that I could donate time to if somebody has instructions on how I could integrate it with CIs.

Picked this up in another thread, was asked to move the posts here by @ocefpaf.

Full comment history from #924 (until now):

@h-vetinari (OP): There's several packages in conda-forge that exceed the CI time limits (e.g. 6h for azure pipelines), and some of them are of central importance to the ecosystem (gcc, qt, etc.).

These timeouts make building the packages extremely cumbersome, but this is unfortunately common enough that a dedicated proposal is being worked on to at least have a process for these cases.

I wanted to open this issue to initiate a discussion with the azure people (where most of the conda-forge-CI is run anyway), to see if there's a way to increase these limits for selected packages, e.g. to 24h? It's not like repeatedly running failing 6h-jobs reduces the total load compared to much fewer successful runs that might take 7-10h or so. Not sure if this has been discussed elsewhere or privately already, but I couldn't find anything, and so having a public discussion can't hurt IMO.

I remember seeing @vtbassmatt helping a lot of (python) packages migrate their CI to azure pipelines, so maybe you can help there, at least by involving/CCing the right people on the azure side? Even if the affected packages would have to go through an individual approval process, removing this timeout on some key feedstocks would be a big win.

@vtbassmatt: 馃憢 Thanks for pinging me. Let me talk to some folks on my side. I think the answer is going to come in the form of a new feature we're launching soon - but I want to double check if I'm forgetting something that's already available.

Not sure if this helps but: the 6 hour limitation is per job, not per pipeline. Can you divide the work into multiple jobs that each stay under the limit? I know this comes with some extra complexity, like needing to upload/download artifacts in several places, so it's not a trivial fix.

@h-vetinari: Xref that I overlooked: #902

@ocefpaf:

Xref that I overlooked: #902

Do you mind copying your comments above to #902 and closing this one just for "housekeeping's" sake?

Hey all. Right now, your best bet for the long-running packages are:

  1. switch to a self-hosted agent (potentially running on a cloud VM, but "self-hosted" from Azure Pipelines's perspective)
  2. split up the long-running work into multiple jobs (downside: you have to deal with publishing/consuming artifacts between jobs)

Stay tuned, as we have a new feature coming soon which will help address this. It's called "elastic private agents" (working title) and gives you the benefits of private agents - you choose the hardware, you get unlimited runtime - with the convenience of hosted.

We don't currently have a knob for making a single customer's jobs able to run longer than 6 hours on hosted.

@vtbassmatt
Thanks for the infos! It seems that also with the upcoming feature, there will have to be self-hosted agents, correct?

Do you think it would be feasible for azure to sponsor a small handful of dedicated (possibly only burstable) VMs to conda-forge for this purpose? I cannot speak for conda-forge/core, but - fundamentally - it seems to be a question similar to increasing the number of hosted agents (conda-forge already has many more agents than other open source projects on azure, and I imagine this underwent a similar approval process).

If there was (at the very least) one sponsored agent each for Linux/MacOs/Win, those could have a separate queue that a conda-forge feedstock could opt into (pending some work on conda-forge side to be able to keep rerendering feedstocks consistently, but that would very likely not be too much work) - which would obviously be restricted to those that absolutely need it.

I will look into this. We have a well-defined path for adding additional hosted capacity for an open source project. There's no such path for adding a self-hosted agent. Which is not to say it's not possible, just that there's no button I can press to make it happen immediately :grin:

@vtbassmatt
Thanks a lot for digging into this. Would be awesome even if the lead time is a bit longer! :)

@vtbassmatt
Just pinging gently if there's been any news.

BTW: This would make an awesome Christmas present for the conda-forge / python communities. ;-)

I've not been able to find anyone who can make this happen 馃槥 Haven't given up but I don't think there'll be any Christmas miracles right now.

No worries. Thanks for trying!

Hey @vtbassmatt, happy belated new decade! :)
Just gently pinging for news on this.

Hey, thanks for the ping! We're soon going to start private preview of elastic self-hosted pools. Basically, we'll do the elastic management side for you; you run it in your Azure subscription so you can have whatever beefy machine you want.

Tagging @vijayma to get you in the preview program when it starts up in a few weeks.

That sounds great. So essentially these will run the same build host configurations as stock ones, just with different hosting? Or do we have to build the machine images ourselves

You'll have to provide the machine image, we're unfortunately not able to distribute ours for legal/compliance reasons.

@vtbassmatt: You'll have to provide the machine image, we're unfortunately not able to distribute ours for legal/compliance reasons.

That's a pity, as the whole point of this exercise would be to have the same infrastructure with which packages are built (to avoid subtle incompatibilities, e.g. through different compiler versions). I'm guessing you wouldn't be able to provide a recipe for the machine images (like a dockerfile) either?

That leads me back to my question further up, if we could re-explore the possibility to selectively bump that 6h cap with a certain approval process, e.g. per repository (= feedstock in conda-speak). The idea being that there are very few packages (e.g. compilers, QT, some GPU packages, ...) that would need this.

Plus, if I may add, conda-forge is solving some of the thorniest (cross-platform!) packaging problems in the whole python ecosystem (and beyond), so it's not just MyLittleLibrary that's asking. ;-)

Lots of our limits are easily mutable. Unfortunately the 6 hour limit is pretty deep and would require a lot of work to change.

Fortunately the image generation scripts are available! https://github.com/actions/virtual-environments

Thanks for all your help with this!

@vtbassmatt
Where do we stand on this? Is the feature still in preview? Do you know if we could package the necessary images as one of the "anvils" of conda-forge, so it doesn't have to be rebuilt every time for CI?

@mariusvniekerk
Do you think this would be feasible to set up (i.e. is there an azure-subscription of conda-forge, or is it "just" on a beefed-up community programme)?

The feature is now in public preview as scale set agents.

Thanks @vtbassmatt! Any comment about us keeping rebuilds of the images somewhere?

Sorry, missed that! Scale set agents aren't Docker images so I don't think you can use anvils straight up. But if you set your scale set agents to be reused, then most of the time, restoring the Docker image would be a no-op or trivial.

(Edited for clarity)

Was this page helpful?
0 / 5 - 0 ratings