Yay: [Suggestion] Parallelize downloading and building of multiple packages

Created on 20 Mar 2019 · 15Comments · Source: Jguer/yay

Would it be possible to add support for downloading and building in parallel? I had in mind that when installing multiple packages, yay could already start building the first package while the second package is downloading. Then the second package is built while the third is downloaded and so on.

I think that would be a nice feature to have because downloading and building can be parallelized so well. It would make updating the system with yay -Syu up to twice as fast.

Source

kangalioo

👍12

Most helpful comment

I think there's a lot of misunderstanding here about my original proposal. First of all, what was bugging me and led me to make this issue is that yay is only ever using CPU _or_ bandwidth at a time. When yay is downloading a package, it's wasting time by letting the CPU idle and vice versa.

I did not propose to run multiple builds in parallel. Neither to run two downloads in parallel. Those things are not necessarily beneficial.

The goal was to have a pipeline with two steps: a downloading step and a build step. When a package has finished downloading, it'll be moved into the build step, and the next package can already start downloading. This ensures that CPU and bandwidth are utilized continuously.

As far as I can tell, the only contra argument that's applicable to the _actual_ feature I'm proposing is the terminal output issue. The issue is that when a download runs alongside a build, their log messages will mix. YodaEmbedding experimented with three possible ways to handle this issue in their comment. In my opinion, those implementations are a bit over-engineered, and a simpler solution would be more appropriate in a command-line tool like yay.

Additionally, I think it would be good to have the pipeline feature opt-in. That way, users are not exposed to the potentially confusing behavior if they don't explicitly request it.

Are there any complaints about this way of handling terminal output?

kangalioo on 14 Nov 2020

👍5

All 15 comments

Parallel building is not a great advantage because then you have competition for cycles (and packages with dependencies will have to be treated). Concurrent download is something that has been thought about and experimented with (it messes a bit with the outputs though).

Jguer on 24 Mar 2019

Sorry, I should have worded my suggestion better. I meant having the downloading process running in parallel to the building process. There would still be only one compilation process at a time, but the downloading would be in parallel to that compilation process. Those processes require completely different system resources, so parallelizing those two would provide a big speed improvement.

kangalioo on 25 Mar 2019

I can see it working but I can't see any way to implement it while also keeping any sort of readable output.

Morganamilo on 25 Mar 2019

It could be a command-line option that's turned off by default, so yay has readable output by default. When enabled, yay could display a status message in between the compilation output giving information about the downloads, maybe something like this:

...
[ 41%] Building CXX object ...
[yay] Download of ___.tar.gz finished, starting download ___.tar.gz...
[ 43%] Building CXX object ...
[ 45%] Building CXX object ...
...

kangalioo on 27 Mar 2019

Seen as parallelizing builds isn't an option, trying to download and build in parallel seems like overkill. I would go with just parallelizing multiple downloads at the same time.

BrendanBall on 3 May 2019

👍1

@Jguer why not pass -j to make and let it do parallelization?

@BrendanBall what's the point in dividing the bandwidth among packages?

dsdante on 18 May 2019

👍1

@dsdante That's the job of /etc/makepkg.conf not yay, you already have the parallelization of a single build and that is effective (everything C++ for example ...). Parallelization of multiple builds would be like trying to shove 20 clowns by a door at a single time, probably slower than having them line up.

Dividing the bandwidth among packages could be beneficial for multiple AUR builds as the sources are different and there could be slow servers in some of them.
On repo packages where all the packages have the same source your connection will typically max out before you're able to max out a single repository, so less effective.

Jguer on 18 May 2019

👍2

@Jguer thanks for the makepkg tip!

I think the easiest way to optimize it is to make a pipeline:
A queue of clowns -> downloading -> building -> installing (but why clowns, anyway?)

The downloader can let the next package use some of the bandwith if it's not 100% used yet and if it's from a different server. Not sure it would worth your time though. (I don't speak Go, so I'm pass, sorry.)

Another issue is output. We'd need to provide info from multiple downloads, building (which is parallelized itself) and installing. I can't think of a nice solution without ncurses.

dsdante on 18 May 2019

I can see it working but I can't see any way to implement it while also keeping any sort of readable output.

I propose three ideas:

Split windows
Display modes
Output everything... but allow filtered output in a separate yay terminal process

One idea is split windows:

[Build output]

==> Starting pkgver()...
==> Starting build()...
Submodule 'submodules/cabal-helper' (https://github.com/alanz/cabal-helper.git)
Submodule 'submodules/ghc-mod' (https://github.com/alanz/ghc-mod.git)
Cloning into '/tmp/yaytmp-1000/haskell-ide-engine-git/src/haskell-ide-engine/...'
Compiling...

---------------------------------------------------------------------------------

[Download bars]

[ ===        ]   30% downloaded alacritty-git
[ =======    ]   70% downloaded aurman

Another idea is toggling between display modes with keystrokes:

[...blah blah blah...]

Switch to download view by pressing [D]
Switch to build view by pressing [B]

Both of the previous two ideas have the downside that you can't search/scroll through past output... but perhaps logging it should be a good compromise.

The final idea is to let yay output everything in the primary terminal... but allow filtered output through command line arguments:

$ yay --download-status
[ ===        ]   30% downloaded alacritty-git
[ =======    ]   70% downloaded aurman

$ yay --build-status
[ ====       ]   40% built cargo-git
[ ==         ]   20% built haskell-ide-engine-git

$ yay --build-output haskell-ide-engine-git
==> Starting pkgver()...
==> Starting build()...
Submodule 'submodules/cabal-helper' (https://github.com/alanz/cabal-helper.git)
Submodule 'submodules/ghc-mod' (https://github.com/alanz/ghc-mod.git)
Cloning into '/tmp/yaytmp-1000/haskell-ide-engine-git/src/haskell-ide-engine/...'
Compiling...

There will, of course, be a separate "main" yay process running in a separate terminal.

YodaEmbedding on 23 May 2019

I'm not sure if we're all talking about the same thing here...

@SicariusNoctis From your suggestions it seems like you're thinking of doing multiple processes of the same type at the same time. For example that both alacritty-git and aurman are downloaded simultaneously, while cargo-git and haskell-ide-engine-git are built simultaneously (stealing each other CPU cycles).

What I really had in mind though was a pipeline, as @dsdante said. There would be one downloading unit and one compiling unit that work in parallel. But each of these units only ever processes one item at a time (similar to how modern CPU pipelines work).

kangalioo on 23 May 2019

This seems like it would add a lot of complexity and open up an avenue for lots of bugs, and all for virtually no significant benefit.

Off the top of my head I can think of multiple scenarios and edge cases that would cause this to break. Even without any errors, it would simply be messy to implement. Popping up multiple windows to show progress, or mixing output from a download in with the build output is just ugly no matter how you do it.

15-20 years ago when we didn't have the bandwidth we do now, and building took much longer on ancient hardware this may have been more important to consider, but do we really need this now? This kinda seems like parallelism just for the sake of parallelism.

ForeverZer0 on 3 Jun 2019

I agree that it would be pretty complex, but I don't think the "bandwidth" argument is valid.

Right now I'm sitting here waiting for yay to update and it's downloading intel-mkl, which is 3GB in size which even with 10MByte per second takes 5 minutes to download. If the downloads were parallelized, it could pretty much could have installed all the other updates during this time.

phiresky on 21 Sep 2019

👍1

I definitely think each package should be downloaded and built in separate processes.

When I'm updating 10-20 packages, the CPU or network bandwidth is barely being hit all the way through (unless an update requires rebuilding Chromium or something). It's generally just a long queue of "downloading... unpacking... making package... tidying install... cleaning up..." 20 times in a row and it takes a good 20 minutes. The update would have taken perhaps 2 minutes if the download/build/make-package part of each package was done in a separate process.

The main process could simply display a progress bar for each package. This would be an amazing improvement.

Hubro on 15 Sep 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 14 Nov 2020

I did not propose to run multiple builds in parallel. Neither to run two downloads in parallel. Those things are not necessarily beneficial.

Additionally, I think it would be good to have the pipeline feature opt-in. That way, users are not exposed to the potentially confusing behavior if they don't explicitly request it.

Are there any complaints about this way of handling terminal output?

kangalioo on 14 Nov 2020

👍5

Was this page helpful?

0 / 5 - 0 ratings