Conda-forge.github.io: Builds that fail on CIs

Created on 19 Oct 2019 · 21Comments · Source: conda-forge/conda-forge.github.io

Certain expensive packages seem to fail even on Azure Machines (6 hour builds). This top post is a summary of the current status as it stands on the date which it has been updated:

Possible reasons

Out of memory
Not enough build time

Builds that fail

pytorch-cpu on Windows -- Not enough RAM https://github.com/conda-forge/pytorch-cpu-feedstock/pull/8
qt

Source

hmaarrfk

Most helpful comment

I will look into this. We have a well-defined path for adding additional hosted capacity for an open source project. There's no such path for adding a self-hosted agent. Which is not to say it's not possible, just that there's no button I can press to make it happen immediately :grin:

vtbassmatt on 22 Nov 2019

🚀2 👍1

All 21 comments

Not too sure what the solution is. I have a large machine that I could donate time to if somebody has instructions on how I could integrate it with CIs.

hmaarrfk on 19 Oct 2019

See https://github.com/conda-forge/conda-forge-enhancement-proposals/pull/5

ocefpaf on 19 Oct 2019

👍2

Picked this up in another thread, was asked to move the posts here by @ocefpaf.

Full comment history from #924 (until now):

@h-vetinari (OP): There's several packages in conda-forge that exceed the CI time limits (e.g. 6h for azure pipelines), and some of them are of central importance to the ecosystem (gcc, qt, etc.).

These timeouts make building the packages extremely cumbersome, but this is unfortunately common enough that a dedicated proposal is being worked on to at least have a process for these cases.

I wanted to open this issue to initiate a discussion with the azure people (where most of the conda-forge-CI is run anyway), to see if there's a way to increase these limits for selected packages, e.g. to 24h? It's not like repeatedly running failing 6h-jobs reduces the total load compared to much fewer successful runs that might take 7-10h or so. Not sure if this has been discussed elsewhere or privately already, but I couldn't find anything, and so having a public discussion can't hurt IMO.

I remember seeing @vtbassmatt helping a lot of (python) packages migrate their CI to azure pipelines, so maybe you can help there, at least by involving/CCing the right people on the azure side? Even if the affected packages would have to go through an individual approval process, removing this timeout on some key feedstocks would be a big win.

@vtbassmatt: 👋 Thanks for pinging me. Let me talk to some folks on my side. I think the answer is going to come in the form of a new feature we're launching soon - but I want to double check if I'm forgetting something that's already available.

Not sure if this helps but: the 6 hour limitation is per job, not per pipeline. Can you divide the work into multiple jobs that each stay under the limit? I know this comes with some extra complexity, like needing to upload/download artifacts in several places, so it's not a trivial fix.

@h-vetinari: Xref that I overlooked: #902

@ocefpaf:

Xref that I overlooked: #902

Do you mind copying your comments above to #902 and closing this one just for "housekeeping's" sake?

h-vetinari on 18 Nov 2019

👍1

Hey all. Right now, your best bet for the long-running packages are:

switch to a self-hosted agent (potentially running on a cloud VM, but "self-hosted" from Azure Pipelines's perspective)
split up the long-running work into multiple jobs (downside: you have to deal with publishing/consuming artifacts between jobs)

Stay tuned, as we have a new feature coming soon which will help address this. It's called "elastic private agents" (working title) and gives you the benefits of private agents - you choose the hardware, you get unlimited runtime - with the convenience of hosted.

We don't currently have a knob for making a single customer's jobs able to run longer than 6 hours on hosted.

vtbassmatt on 19 Nov 2019

@vtbassmatt
Thanks for the infos! It seems that also with the upcoming feature, there will have to be self-hosted agents, correct?

Do you think it would be feasible for azure to sponsor a small handful of dedicated (possibly only burstable) VMs to conda-forge for this purpose? I cannot speak for conda-forge/core, but - fundamentally - it seems to be a question similar to increasing the number of hosted agents (conda-forge already has many more agents than other open source projects on azure, and I imagine this underwent a similar approval process).

If there was (at the very least) one sponsored agent each for Linux/MacOs/Win, those could have a separate queue that a conda-forge feedstock could opt into (pending some work on conda-forge side to be able to keep rerendering feedstocks consistently, but that would very likely not be too much work) - which would obviously be restricted to those that absolutely need it.

h-vetinari on 19 Nov 2019

vtbassmatt on 22 Nov 2019

🚀2 👍1

@vtbassmatt
Thanks a lot for digging into this. Would be awesome even if the lead time is a bit longer! :)

h-vetinari on 25 Nov 2019

@vtbassmatt
Just pinging gently if there's been any news.

BTW: This would make an awesome Christmas present for the conda-forge / python communities. ;-)

h-vetinari on 2 Dec 2019

👍1

I've not been able to find anyone who can make this happen 😞 Haven't given up but I don't think there'll be any Christmas miracles right now.

vtbassmatt on 3 Dec 2019

No worries. Thanks for trying!

h-vetinari on 3 Dec 2019

Hey @vtbassmatt, happy belated new decade! :)
Just gently pinging for news on this.

h-vetinari on 3 Feb 2020

Hey, thanks for the ping! We're soon going to start private preview of elastic self-hosted pools. Basically, we'll do the elastic management side for you; you run it in your Azure subscription so you can have whatever beefy machine you want.

Tagging @vijayma to get you in the preview program when it starts up in a few weeks.

vtbassmatt on 3 Feb 2020

That sounds great. So essentially these will run the same build host configurations as stock ones, just with different hosting? Or do we have to build the machine images ourselves

mariusvniekerk on 10 Feb 2020

You'll have to provide the machine image, we're unfortunately not able to distribute ours for legal/compliance reasons.

vtbassmatt on 10 Feb 2020

@vtbassmatt: You'll have to provide the machine image, we're unfortunately not able to distribute ours for legal/compliance reasons.

That's a pity, as the whole point of this exercise would be to have the same infrastructure with which packages are built (to avoid subtle incompatibilities, e.g. through different compiler versions). I'm guessing you wouldn't be able to provide a recipe for the machine images (like a dockerfile) either?

That leads me back to my question further up, if we could re-explore the possibility to selectively bump that 6h cap with a certain approval process, e.g. per repository (= feedstock in conda-speak). The idea being that there are very few packages (e.g. compilers, QT, some GPU packages, ...) that would need this.

Plus, if I may add, conda-forge is solving some of the thorniest (cross-platform!) packaging problems in the whole python ecosystem (and beyond), so it's not just MyLittleLibrary that's asking. ;-)

h-vetinari on 6 Mar 2020

Lots of our limits are easily mutable. Unfortunately the 6 hour limit is pretty deep and would require a lot of work to change.

Fortunately the image generation scripts are available! https://github.com/actions/virtual-environments

vtbassmatt on 6 Mar 2020

👍1

Thanks for all your help with this!

h-vetinari on 6 Mar 2020

@vtbassmatt
Where do we stand on this? Is the feature still in preview? Do you know if we could package the necessary images as one of the "anvils" of conda-forge, so it doesn't have to be rebuilt every time for CI?

@mariusvniekerk
Do you think this would be feasible to set up (i.e. is there an azure-subscription of conda-forge, or is it "just" on a beefed-up community programme)?

h-vetinari on 16 May 2020

The feature is now in public preview as scale set agents.

vtbassmatt on 16 May 2020

Thanks @vtbassmatt! Any comment about us keeping rebuilds of the images somewhere?

h-vetinari on 16 May 2020

Sorry, missed that! Scale set agents aren't Docker images so I don't think you can use anvils straight up. But if you set your scale set agents to be reused, then most of the time, restoring the Docker image would be a no-op or trivial.

(Edited for clarity)

vtbassmatt on 16 May 2020

Was this page helpful?

0 / 5 - 0 ratings