Using a commit with MPFR 4.0.2 but before the BinaryBuilder PR I get
julia> a = 5.0; b = big"5.0"
5.0
julia> @btime $a + $b;
57.579 ns (2 allocations: 112 bytes)
while on https://github.com/JuliaLang/julia/pull/31727 I get
julia> @btime $a + $b;
100.848 ns (2 allocations: 112 bytes)
My guess is thus that the BB built MPFR library is slower than the one we used to built locally. This is reported by Nanosoldier at https://github.com/JuliaLang/julia/pull/31727#issuecomment-484159317 for the 1.2 backport branch.
I think there are two possibilities here:
The BB tarballs are being overly conservative in their architecture choice. Looking into what our GCC shards are configured for, they are defaulting to -march=x86-64
which is extremely conservative. The lower bound we advertise is -march=core2
, so I will make sure to include that proper default in the next shard update (which I've conveniently been working on for the past few days), so once the new compiler shards are ready, we'll rebuild MPFR and see if that helps.
Newer versions of GCC are smarter than the 4.8.5 we're using as the default compiler. The main reason we use 4.8.5 is to make C++ compatibility as simple as possible. We do not yet have a nice, automatic way to determine that a binary dependency does not rely upon libstdc++
and auto-build it with GCC8, but we should. In the meantime, we could try rebuilding MPFR
with GCC8, and then see if this improves at all.
Bump. This is release blocking AFAIU so would be nice to have some progress / update here. Should we go back to just building MPFR from source?
It took me almost three weeks to iron out issues, but I finally got my new BB shards building, so we can take a look at this a little more closely; Compiling with a better default -march
flag actually gives us a pretty decent speedup:
Original 1.1 performance (compiled with GCC 7.1):
julia> using BenchmarkTools
a = 5.0
b = big"5.0"
@btime $a + $b;
71.648 ns (2 allocations: 112 bytes)
Current master
performance (compiled with GCC 4.8.5, -march=x86-64
)
julia> using BenchmarkTools
a = 5.0
b = big"5.0"
@btime $a + $b;
183.323 ns (2 allocations: 112 bytes)
My branch
performance (compiled with GCC 4.8.5, -march=core2
on x86_64
):
julia> using BenchmarkTools
a = 5.0
b = big"5.0"
@btime $a + $b;
113.536 ns (2 allocations: 112 bytes)
Compiling with GCC 8 doesn't change anything, so I think the second bullet point is irrelevant. I'm still investigating the rest of the performance issues; I will try rebuilding GMP with the new BB shards to see if that is the cause of the remainder of the slowdown.
1.2 got changed to use source builds.
(For MPFR and GMP only, just to mitigate the performance regression.)
Why was the milestone here changed? From my memory, changing to source builds was only on 1.2 backport branch so this seems to be still in effect. Changing back milestone until it has been confirmed.
Moved milestone since I backported the disable BB PR to the 1.3-rc branch. If we get it fixed for 1.3 then great, but no need for the milestone now.
Most helpful comment
It took me almost three weeks to iron out issues, but I finally got my new BB shards building, so we can take a look at this a little more closely; Compiling with a better default
-march
flag actually gives us a pretty decent speedup:Original 1.1 performance (compiled with GCC 7.1):
Current
master
performance (compiled with GCC 4.8.5,-march=x86-64
)My
branch
performance (compiled with GCC 4.8.5,-march=core2
onx86_64
):Compiling with GCC 8 doesn't change anything, so I think the second bullet point is irrelevant. I'm still investigating the rest of the performance issues; I will try rebuilding GMP with the new BB shards to see if that is the cause of the remainder of the slowdown.