In #26974 a performance regression of cc-wrapper was fixed that created a ~3 minute overhead (100% CPU time spent in bash) because a .rsp file parser was implemented in bash.
I noticed that even after this fix, there is still a significant overhead per link, namely 10 seconds (again 100% CPU time spent in bash).
I suspect that it's quadratic appends again.
(Also poses the question if bash is the right language for _anything_ that has a loop in it -- you just can't get it right reliably.)
CC @ryantrinkle @domenkozar @cleverca22 @Ericson2314
I witness this on commit aad2a21; don't have a minimal repro right now, but building https://github.com/input-output-hk/cardano-sl is one way to observe it.
To be tested with aa4a92d
Testing now.
Maybe aa4a92d2df helps a bit. It gets rid of some slow appends. However, there are also some pattern matches against the entire string that are O(n).
Testing now.
This will take a couple hours of rebuilding.
@edolstra I think you've found/eliminated roughly half of the quadratic behaviour:
Overhead now decreased from 10 to 4 seconds per executable.
(Note this is still a big deal, because the actual ld.gold execution takes ~a couple milliseconds, so for my 10 executables it makes the difference between waiting half minute and instantaneous~ ~1 second.)
I've pushed another improvement (47821f1cf0cd853d3d3dfea9259e02fea2766327). Hopefully that takes care of the remaining 4 seconds :-)
BTW, to quickly test this, I just copied cc-wrapper to cc-wrapper-new, and applied this patch to all-packages.nix:
diff --git a/pkgs/top-level/all-packages.nix b/pkgs/top-level/all-packages.nix
index 110b79fce5..ecd5adfe10 100644
--- a/pkgs/top-level/all-packages.nix
+++ b/pkgs/top-level/all-packages.nix
@@ -6024,6 +6024,17 @@ with pkgs;
inherit libc extraBuildCommands;
};
+ wrapCCWith2 = libc: extraBuildCommands: baseCC: callPackage ../build-support/cc-wrapper-new {
+ nativeTools = stdenv.cc.nativeTools or false;
+ nativeLibc = stdenv.cc.nativeLibc or false;
+ nativePrefix = stdenv.cc.nativePrefix or "";
+ cc = baseCC;
+ dyld = if stdenv.isDarwin then darwin.dyld else null;
+ isGNU = baseCC.isGNU or false;
+ isClang = baseCC.isClang or false;
+ inherit libc extraBuildCommands;
+ };
+
ccWrapperFun = callPackage ../build-support/cc-wrapper;
wrapCC = wrapCCWith stdenv.cc.libc "";
@@ -18671,6 +18682,7 @@ with pkgs;
inherit (callPackages ../tools/package-management/nix {
storeDir = config.nix.storeDir or "/nix/store";
stateDir = config.nix.stateDir or "/nix/var";
+ stdenv = overrideCC stdenv (wrapCCWith2 stdenv.cc.libc "" stdenv.cc.cc);
})
nix
nixStable
and then tested with nix-build -A nixUnstable.
Maybe someday we'll be able to shake this bash addiction 馃槮
@edolstra Thanks, I'll test your change.
One one problem is that already the stat()ing of 160k files in bash (400 -L times 400 -l, for my 400 Haskell library dependencies) in the [ -f "$i/lib$j.so" ] loop takes almost 1 second, and thus still a significant amount of time.
A faster way to do it is how ld.gold does it: Listing directory contents instead of lots of stat()s.
That scales linearly with the number of -ls, instead of quadratic with -L * -l.
Thus I propose, in the spirit of @ryantrinkle's fix for the older 3-minute issue, the following C++ solution that finds the RPATH 50x faster (in 28 ms vs the 1 second number):
(Edit: Slightly improved version maintained at this gist)
// find_libdirs: Searches for given .so/.a library names a given list of dirs.
//
// These environment variables need to be set:
// * $libPath - The list of dirs do search in
// * $libs - The library names
//
// Arguments:
// * the extension to use (e.g. ".a" or ".so")
//
// Outputs the first occurrence of each entry $dir in $libPath
// that conains at least one $libname of the given $libs.
//
// For example, it outputs $dir if $dir/lib${libname}.a exists.
#include <cstdlib>
#include <experimental/filesystem>
#include <iostream>
#include <iterator>
#include <set>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
namespace fs = std::experimental::filesystem;
// String splitting implementation from https://stackoverflow.com/a/236803/263061
template<typename Out>
void split(const string &s, char delim, Out result, bool skip_empty=true) {
stringstream ss;
ss.str(s);
string item;
while (getline(ss, item, delim)) {
if (skip_empty && item.empty()) continue;
*(result++) = item;
}
}
vector<string> split(const string &s, char delim, bool skip_empty=true) {
vector<string> elems;
split(s, delim, back_inserter(elems));
return elems;
}
int main(int argc, char const *argv[])
{
// Parse args
if (argc < 2) {
cerr << "Usage: find_libdirs .extension" << endl;
cerr << endl;
cerr << "Example: find_libdirs .so" << endl;
exit(EXIT_FAILURE);
}
string extensionSuffix = argv[1];
// Get env vars
char *libPath_ptr = getenv("libPath");
char *libs_ptr = getenv("libs");
if (!libPath_ptr || !libs_ptr) {
cerr << "find_libdirs: You must set the $libPath and $libs environment variables" << endl;
exit(EXIT_FAILURE);
}
// Split paths
vector<string> libPaths = split(libPath_ptr, ' ');
vector<string> libs = split(libs_ptr, ' ');
// To print only the first ocurrence of each dir
set<string> usedDirs;
// Instead of stat()ing (libPaths * libs) many files, read
// the directory contents for a much faster implementation,
// just like `ld.gold` does it.
for (string & dir : libPaths) {
if (fs::exists(dir)) {
set<string> files;
for (auto & dirent : fs::directory_iterator(dir)) {
files.insert(dirent.path().filename());
}
for (string & libName : libs) {
bool found = files.find("lib" + libName + extensionSuffix) != files.end();
if (found) {
usedDirs.insert(dir);
break; // we want only the first output
}
}
}
}
// Output
for (string const & dir : usedDirs) {
cout << dir << endl;
}
return 0;
}
g++ -O2 -Wall -std=c++17 find_libdirs.cc -lstdc++fs -o find_libdirs
Maybe someday we'll be able to shake this bash addiction 馃槮
@copumpkin Some day ;)
@nh2 Before the rewrite of the core of expandResponseParams in C, it was not taking 3 minutes like you say. It was taking 2 seconds for largest known input, after the optimization of bash code in #26554. And here also there is no need to change the language if what is needed is to change the algorithm. Here is a more efficient one than the code that you wrote, since it builds just one associative array for the libs, rather than one for each directory:
declare -A libs
addToLibs() {
libs["lib$1.so"]=1
}
for dir in $libPath; do
cd "$dir" 2>/dev/null || continue
for file in *; do
if [ "${libs[$file]}" ]; then
addToRPath $dir
break
fi
done
cd - >/dev/null
done
@orivej Looks good to me.
@nh2 If we're going the C++ rewrite route, we should rewrite the entire ld-wrapper in C++ (and incorporate expand-response-params), rather than have a shell script calling out into some disparate C++ programs.
it was not taking 3 minutes like you say. It was taking 2 seconds for largest known input
@orivej For your use case maybe, but for my project, it took 3 minutes, because I have a couple more entries in my .rsp file. If the algorithm is quadratic, the time explodes quickly.
And here also there is no need to change the language if what is needed is to change the algorithm.
That is true, those are independent concerns; I wanted to solve them both in one go because (apparently) bash makes it very hard to get things right.
If we're going the C++ rewrite route, we should rewrite the entire ld-wrapper in C++ (and incorporate expand-response-params), rather than have a shell script calling out into some disparate C++ programs.
Makes sense; I haven't looked into what the non-stat()-related parts do in detail yet, so that's probably for another day; in the meanwhile I've put the C++ code into this gist.
OK, I've tested the version in https://github.com/orivej/nixpkgs/commit/7385c67b0cb0afa38228776e6ec6179af7b0f92c
The .so finding is now fast.
There's a remaining ~0.3s overhead in the while [ $n -lt ${#allParams[*]} ]; do loop block, that would be nice to get rid of, but at least it seems to be all-linear now, so thanks a lot!
Another possibility (the cleanest, really) is to patch binutils' ld to add RPATH entries for all used libraries automatically. That would remove the need for most of ld-wrapper.
@edolstra
If we're going the C++ rewrite route, we should rewrite the entire ld-wrapper in C++
Well, we cannot use it in the first bootstrapping stage then, unless we include it in the bootstrap tools.
patch binutils
I like this. Make it a flag, and we can perhaps upstream it? @orivej's should be merged as a stop-gap no matter what, however.
Another possibility (the cleanest, really) is to patch binutils to add RPATH entries for all used libraries automatically. That would remove the need for most of ld-wrapper.
Or better even than patching, add it as an upstream feature flag?
If you go for the patching, it would be nice if it could still be a flag so that an "unmodified" version of ld/gold is still available.
Keep in mind that there are two lds in play, so we wouldn't be able to get rid of the wrapper until we had patches for both the binutils ld and the cctools one for Darwin.
@edolstra
Another possibility (the cleanest, really) is to patch binutils' ld to add RPATH entries for all used libraries automatically. That would remove the need for most of ld-wrapper.
What about libraries outside of /nix/store? Currently they are explicitly not added to rpath.
@nh2
For your use case maybe, but for my project, it took 3 minutes, because I have a couple more entries in my .rsp file. If the algorithm is quadratic, the time explodes quickly.
expand-response-params took minutes with large .rsp files of Haskell projects only before #26554, which specifically addressed this issue. The main cause was slow reading of input.
Most helpful comment
Maybe someday we'll be able to shake this
bashaddiction 馃槮