The final step of building (exporting a tar.xz file) seems to use a single CPU core only. Perhaps some optimization can be made to speed it up (using multiple CPU cores).
Noticed in KDE System Monitor while this command was running in console, there was only one core at 100%, all others were at 0%:
./out/Default archive -o /tmp/download/ungoogled-chromium/build/src/ungoogled_packaging/ungoogled-chromium_70.0.3538.110-1_linux.tar.xz -i /tmp/download/ungoogled-chromium/build/src/ungoogled_packaging/README
linux_portableclang -v
clang version 8.0.0 (trunk 346299)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /<***>/llvm-build/bin
Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7
Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7
Candidate multilib: .;@m64
Selected multilib: .;@m64
The same described in #447.
AFAIK, xz utility does not support multithreaded compression/decompression now, so apparently nothing can be done. Does this step really consume a lot of time for you?
I haven't taken the trouble to measure it but the very
fact that I had enough time to notice it means it can
be optimized. Considering that the whole build takes
about 2h everything counts.
BTW man page of xz shows a --threads= option.
This is a command in buildkit. I parse FILES.cfg to only select the files from the build outputs that are necessary to run Chromium. The tar.xz generation logic is included here as well, and there's no trivial way to implement multi-threading here using the standard library.
I am open to having the code re-written so it uses xz and/or GNU tar directly (Python's tarfile module is not that fast). The only thing that's annoying about tar is it's hard to specify the file structure inside of tar archives (without copying/moving files into the desired structure); Python's tarfile is easier to use in this regard. Feel free to submit a PR.
I don't know how to fix this with Python.
I just noticed it and decided to share it.
@anchev Any solution is fine as long as it's easy to maintain and doesn't require bulky or obscure dependencies.
Can you point me to the actual code which runs the xz?
@anchev Right now it's all in Python: https://github.com/Eloston/ungoogled-chromium/blob/efe5ffcd19d8e777c40f354b48d5f6ca54e14683/buildkit/filescfg.py#L62
tarfile uses the lzma module internally if given a tar mode that specifies xz compression.
The alternative solutions can be to create a tar with no compression (via buildkit) and stream the output to xz, or print the list of necessary files from FILES.cfg and send it to tar and xz. I'm not sure how much faster the former solution would be, but the latter solution is harder to implement.
Thanks. Unfortunately I don't know how this has to be fixed in Python.
I reported it because noticed that it takes more than a minute to create that tar.xz which seems way too much considering that the whole build is being done on a ramdrive (tmpfs) using i7 3770.
Just for comparison, using system tar it takes less than 10 seconds to compress the same data:
[/tmp/download]: time tar czf test.tgz ungoogled-chromium_70.0.3538.110-1_linux
real 0m9.860s
user 0m9.794s
sys 0m0.367s
I noticed that the above generates 94605755 bytes file which is probably not a fair comparison.
Testing with something closer to what you use and with streaming output to xz I see that lzma doesn't seem to use multiple threads at all (resulting file is 67179312 bytes):
[/tmp/download]: time tar -c ungoogled-chromium_70.0.3538.110-1_linux | xz --format=lzma --threads=1 > test.tar.xz
real 1m30.870s
user 1m30.529s
sys 0m0.473s
[/tmp/download]: time tar -c ungoogled-chromium_70.0.3538.110-1_linux | xz --format=lzma --threads=4 > test.tar.xz
real 1m30.516s
user 1m30.250s
sys 0m0.533s
For comparison 7z creates is faster and the output file is smaller:
[/tmp/download]: time tar cf - ungoogled-chromium_70.0.3538.110-1_linux | 7za a -si -t7z -m0=lzma2 -mx=9 -mfb=64 -md=32m -ms=on test.tar.7z
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (306A9),ASM,AES-NI)
Creating archive: test.tar.7z
Items to compress: 1
Files read from disk: 1
Archive size: 63887463 bytes (61 MiB)
Everything is Ok
real 0m41.607s
user 1m54.822s
sys 0m0.538s
So if anything can be optimized: perhaps 7z is the way.
If you use makepkg, I have those modified lines in /etc/makepkg.conf for multithreaded compression:
COMPRESSGZ=(pigz -c -f -n)
COMPRESSBZ2=(bzip2 -c -f)
COMPRESSXZ=(xz -c -z - --threads=0)
COMPRESSLRZ=(lrzip -q)
COMPRESSLZO=(lzop -q)
COMPRESSZ=(compress -c -f)
also remember to install pigz
@clapbr This is a custom (albeit primitive) distro-agnostic archive generation process I wrote. It has nothing to do with Arch Linux's makepkg.
Some data. From the build logs I saw:
test ! -e ../ungoogled-chromium_72.0.3626.109.orig.tar.xz || rm -f ../ungoogled-chromium_72.0.3626.109.orig.tar.xz
tar cf - chromium-72.0.3626.109 | xz -6 -T 1 - > ../ungoogled-chromium_72.0.3626.109.orig.tar.xz
echo $(($(date +%s) - $(cat ../ungoogled-chromium_72.0.3626.109.seconds))) seconds
1454 seconds
rm -rf chromium-72.0.3626.109
echo $(($(date +%s) - $(cat ../ungoogled-chromium_72.0.3626.109.seconds))) seconds | tee seconds
1459 seconds
find debian/scripts/ungoogled-chromium/ -name __pycache__ -type d -exec rm -r {} +
Which is a bit slow. I had previously done some compression benchmarking[1] and regularly use xz in -0 --threads=0 mode and decided to compare (0 means 'all cpu cores/threads' in xz-speak, which is eight threads on this laptop). Summary (file-level-threads-time) in name:
-rw-r--r-- 1 me me 1534832640 Feb 18 12:11 ungoogled-chromium_72.0.3626.109.orig.tar
-rw-r--r-- 1 me me 304215500 Feb 18 12:14 ungoogled-chromium_72.0.3626.109.orig.tar.xz-0-0-16s
-rw-r--r-- 1 me me 220908524 Feb 18 12:11 ungoogled-chromium_72.0.3626.109.orig.tar.xz-6-0-1m59s
-rw-r--r-- 1 me me 218261160 Feb 18 12:29 ungoogled-chromium_72.0.3626.109.orig.tar.xz-6-1-9m20s
-rw-r--r-- 1 me me 218261160 Feb 18 12:11 ungoogled-chromium_72.0.3626.109.orig.tar.xz-orig
Explained:
The original tar file was about 1.5GB.
The current system uses xz level 6 compression and forces single threaded operation. This gives a 218MB file and takes 9 minutes and 20 seconds on this hardware[2].
Using level 6 with all cores gives a 220MB file but takes 1 minute 59 seconds. There's a tiny size overhead with multithreaded writers in the xz format (lzma can't do this AFAIK).
Using level 0 with all cores gives a 304MB file and takes just 16 seconds.
Suggestion:
I know the mechanism is being worked out above, but when that's in place I recommend using the "-0 --threads=0" tuning for normal builds. Multithreading alone saves about 7 minutes on my system. In my system I can spare 85MB of disk space to save 1m42s of build time (the delta of -0 and -6) as it's a tiny fraction of the space required to build and my time is more valuable than 85MB of disk. I'm currently trying to get my build on buster working again which requires many iterations and that scenario is likely to come up again for many contributors.
If somebody winds up building source packages for distro repos it might make sense in that one case to crank compression up to 9 to minimize repo space utilization and data transfer from their mirrors. I don't think it's worth slowing down builds in other cases, but that would imply the ability to tune it on the fly (or at least tweak settings without too much trouble).
[1] https://bfccomputing.com/2017/08/01/xz-your-next-default-compression-tool.html
[2]
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 70
model name : Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
stepping : 1
microcode : 0x1a
cpu MHz : 3071.823
cache size : 6144 KB