Go: math/big: large fibonacci calculation is faster in GMP

Created on 20 Mar 2019  Â·  21Comments  Â·  Source: golang/go

What version of Go are you using (go version)?

go1.12.1 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output

set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\PhilipAppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\Philip\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Gopkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\PhilipAppData\Local\Temp\go-build593709294=/tmp/go-build -gno-record-gcc-switches

What did you do?

I wrote a Fibonacci calculator in Go using the math/big library. It was ~instant in run time up until n=10^7, after which it was extremely slow. I then used https://github.com/ncw/gmp to replace math/big with a gmp wrapper. No code was changed.

What did you expect to see?

A small performance benefit, maybe 2x or 3x.

What did you see instead?

With the same set of code, gmp quickly calculates n=10^8 in 5 seconds, while the normal math/big library takes more than 5 minutes. Gmp is even able to calculate n=10^9 after 1 minute.

NeedsInvestigation Performance

Most helpful comment

For faster asymptotic multiplication, @remyoudompheng has https://github.com/remyoudompheng/bigfft which is much faster than math/big and only ~2x slower than gmp for very large numbers. Maybe he would like to contribute that implementation to math/big.

Also, for printing large integers @crvv noted above see https://github.com/golang/go/issues/20906

All 21 comments

Here is the fibonacci implementation:

func fib(n int) *big.Int {

    left := big.NewInt(1)
    right := big.NewInt(1)

    helper1 := big.NewInt(0)
    helper2 := big.NewInt(0)


    bin := fmt.Sprintf("%b", n)
    for i := 1; i < len(bin); i++ {

        helper1.Mul(big.NewInt(2), right)
        helper1.Sub(helper1, left)
        helper1.Mul(left, helper1)

        helper2.Mul(right, right)
        left.Mul(left, left)
        helper2.Add(helper2, left)

        if (bin[i] == '0') {
            left.Set(helper1)
            right.Set(helper2)
        } else {
            left.Set(helper2)
            right.Add(helper1, helper2)
        }
    }

    return left
}

Using math/big n=3*10^7:

     flat  flat%   sum%        cum   cum%
    31.93s 54.18% 54.18%     31.93s 54.18%  math/big.mulAddVWW
    23.17s 39.32% 93.50%     23.17s 39.32%  math/big.subVV
     2.34s  3.97% 97.47%      2.34s  3.97%  math/big.addMulVVW
     0.31s  0.53% 98.00%      0.31s  0.53%  runtime.cgocall
     0.15s  0.25% 98.25%      1.79s  3.04%  math/big.basicSqr
     0.14s  0.24% 98.49%     55.11s 93.52%  math/big.nat.divLarge
     0.13s  0.22% 98.71%      0.99s  1.68%  math/big.basicMul
     0.02s 0.034% 98.74%      0.38s  0.64%  internal/poll.(*FD).writeConsole
     0.01s 0.017% 98.76%      1.19s  2.02%  math/big.karatsuba
         0     0% 98.76%     55.71s 94.54%  fmt.(*pp).doPrintf

Using https://github.com/ncw/gmp at n=3*10^7 (same code):

     flat  flat%   sum%        cum   cum%
     1.07s 89.92% 89.92%      1.07s 89.92%  runtime.cgocall
     0.04s  3.36% 93.28%      0.04s  3.36%  runtime.memmove
     0.02s  1.68% 94.96%      0.02s  1.68%  runtime.stdcall2
     0.02s  1.68% 96.64%      0.02s  1.68%  unicode/utf16.Encode
     0.01s  0.84% 97.48%      0.01s  0.84%  indexbytebody
     0.01s  0.84% 98.32%      0.01s  0.84%  runtime.makemap_small
     0.01s  0.84% 99.16%      0.01s  0.84%  runtime.stdcall1
     0.01s  0.84%   100%      0.01s  0.84%  runtime.stdcall3
         0     0%   100%      0.53s 44.54%  fmt.(*pp).doPrintf
         0     0%   100%      0.53s 44.54%  fmt.(*pp).handleMethods

Using https://github.com/ncw/gmp at n=10^9 (same code):

flat  flat%   sum%        cum   cum%
    50.01s 91.81% 91.81%     50.01s 91.81%  runtime.cgocall
     1.14s  2.09% 93.90%      1.14s  2.09%  runtime.memmove
     0.54s  0.99% 94.90%     10.37s 19.04%  internal/poll.(*FD).writeConsole
     0.50s  0.92% 95.81%      0.50s  0.92%  runtime.procyield
     0.48s  0.88% 96.70%      0.60s  1.10%  unicode/utf16.Encode
     0.34s  0.62% 97.32%      0.34s  0.62%  runtime.memclrNoHeapPointers
     0.29s  0.53% 97.85%      0.29s  0.53%  unicode/utf8.DecodeRune
     0.01s 0.018% 97.87%      1.17s  2.15%  runtime.systemstack
         0     0% 97.87%     34.33s 63.03%  fmt.(*pp).doPrintf
         0     0% 97.87%     34.33s 63.03%  fmt.(*pp).handleMethods

gmp uses CGo. math/big is pure Go.

And according to their site https://gmplib.org/

GMP is carefully designed to be as fast as possible, both for small operands and for huge operands. The speed is achieved by using fullwords as the basic arithmetic type, by using fast algorithms, with highly optimised assembly code for the most common inner loops for a lot of CPUs, and by a general emphasis on speed.

It is expected that highly optimized C code will be faster than Go.

0.14s  0.24% 98.49%     55.11s 93.52%  math/big.nat.divLarge

divLarge is slow.
Please see https://github.com/golang/go/issues/21960.

The code doesn't use divLarge, so the pprof result posted previously includes (*big.Int).String().

Without (*big.Int).String(), for n = 1e8, math/big uses 32 s and gmp uses 1.6 s on my machine.

The time complexity of math/big is about n**(log(3)/log(2)), which is the complexity of karatsuba algorithm.
For gmp, it is is about n**(log(2.3)/log(2))

The pprof result of math/big is

      flat  flat%   sum%        cum   cum%
    17.27s 59.80% 59.80%     17.27s 59.80%  math/big.addMulVVW
     1.88s  6.51% 66.31%      1.88s  6.51%  runtime.memmove
     1.87s  6.48% 72.78%      1.87s  6.48%  math/big.subVV
     1.51s  5.23% 78.01%      1.51s  5.23%  runtime.madvise
     1.38s  4.78% 82.79%     12.85s 44.49%  math/big.basicSqr
     1.34s  4.64% 87.43%      1.34s  4.64%  math/big.addVV
     1.09s  3.77% 91.20%      9.13s 31.61%  math/big.basicMul
     0.74s  2.56% 93.77%      0.74s  2.56%  runtime.memclrNoHeapPointers
     0.31s  1.07% 94.84%      0.31s  1.07%  math/big.shlVU
     0.24s  0.83% 95.67%      0.69s  2.39%  runtime.stkbucket

Though I am not yet a Go developer, I am tempted to chime in, since I have implemented this in Python3.

This is not how I would find the nth Fibonacci Number. Instead use the direct formula using the Golden Ration (Phi) to achieve results in O(1) time!

I am uploading the code to a Public repository soon, will share link once done.

Thx
Milind

I believe the correct term to Google would be Binet's Formula... and here it is..

F(n) = round( Phi^n / √5 )

This is yet another reason to know good math and not just code!

Regards
Milind

I believe the correct term to Google would be _Binet's Formula_... and here it is..

F(n) = round( Phi^n / √5 )

This is yet another reason to know good math and not just code!

Regards
Milind

While the fibonacci implementation is not the focus of this thread, it is clear that you have some fundamental misunderstandings about floating point as well as big integer arithmetic. Floating point numbers are only o(1) multiplication because they have finite precision. Big Integers, on the other hand, have arbitrary precision. In particular, the size we are referring to in this thread are in the millions of digits.
At the level of precision necessary to get an exact answer, you would need extremely precise floating point values which would take much longer to multiply. Additionally, assuming you are using exponentiation by squaring and not naive exponentiation, it is the same number of multiplications (log(n)) as in the function I posted.
My implementation is about as fast as the native gmp implementation of fibonacci, which means that it doesn't really get much faster.
If you're in doubt, try plugging in n=10000000 to that function and see if you get the correct 2-million digit answer.

@Borthralla Thanks for the insight! I do have a misunderstanding about the difference between floats and Big Integers. I will read this up in a day or two.

On a side note, this is yet another reason for me to be on this forum, thanks @Borthralla.

AND Exponentiation does take O(n) you are right yet again!

However, also note that there may be a great benefit in writing this in a single line of code as well, esp for (smaller) integer numbers.

The code doesn't use divLarge, so the pprof result posted previously includes (*big.Int).String().

Without (*big.Int).String(), for n = 1e8, math/big uses 32 s and gmp uses 1.6 s on my machine.

The time complexity of math/big is about n**(log(3)/log(2)), which is the complexity of karatsuba algorithm.
For gmp, it is is about n**(log(2.3)/log(2))

The pprof result of math/big is

      flat  flat%   sum%        cum   cum%
    17.27s 59.80% 59.80%     17.27s 59.80%  math/big.addMulVVW
     1.88s  6.51% 66.31%      1.88s  6.51%  runtime.memmove
     1.87s  6.48% 72.78%      1.87s  6.48%  math/big.subVV
     1.51s  5.23% 78.01%      1.51s  5.23%  runtime.madvise
     1.38s  4.78% 82.79%     12.85s 44.49%  math/big.basicSqr
     1.34s  4.64% 87.43%      1.34s  4.64%  math/big.addVV
     1.09s  3.77% 91.20%      9.13s 31.61%  math/big.basicMul
     0.74s  2.56% 93.77%      0.74s  2.56%  runtime.memclrNoHeapPointers
     0.31s  1.07% 94.84%      0.31s  1.07%  math/big.shlVU
     0.24s  0.83% 95.67%      0.69s  2.39%  runtime.stkbucket

@crvv Thanks for the explanation. Are there plans to support faster operations for extremely large integers in math/big similar to gmp? Is gmp's method prohibitively complex?

cc @griesemer @bmkessler

Do you know offhand whether gmp uses karatsuba or a different algorithm for this?

@josharian

Definitely not. gmp stops using karatsuba pretty early:

NxN limb multiplications and squares are done using one of seven algorithms, as the size N increases.

Algorithm | Threshold
-- | --
Basecase | (none)
Karatsuba | MUL_TOOM22_THRESHOLD
Toom-3 | MUL_TOOM33_THRESHOLD
Toom-4 | MUL_TOOM44_THRESHOLD
Toom-6.5 | MUL_TOOM6H_THRESHOLD
Toom-8.5 | MUL_TOOM8H_THRESHOLD
FFT | MUL_FFT_THRESHOLD

Source: https://gmplib.org/manual/Multiplication-Algorithms.html#Multiplication-Algorithms

The thresholds are calibrated by a calibration procedure, but for reference MUL_TOOM33_THRESHOLD (the point where they stop using karatsuba) is pretty low. Like in the order of a 100 limbs (= machine words): https://gmplib.org/repo/gmp/file/tip/mpn/x86/coreisbr/gmp-mparam.h#l56

@josharian GMP uses schoolbook -> Toom-Cook -> Schönhage-Strassen for multiplication, depending on the input size.

Does math/big include an implementation of Schönhage-Strassen?

No, it uses karatsuba for large numbers.

For faster asymptotic multiplication, @remyoudompheng has https://github.com/remyoudompheng/bigfft which is much faster than math/big and only ~2x slower than gmp for very large numbers. Maybe he would like to contribute that implementation to math/big.

Also, for printing large integers @crvv noted above see https://github.com/golang/go/issues/20906

Most observations have already been made here. To summarize:

  • We know that certain low-level ops could be faster (e.g., addMulVVW)
  • math/big only implements Karatsuba. It should use the various Toom versions and FFT for very large numbers.
  • Division is slow (for output).

One should probably start with the Tooms.

Good new, guys! They found an nlog(n) multiplication algorithm, we should just use that!
https://hal.archives-ouvertes.fr/hal-02070778/document
(Mostly joking, but the algorithm is real and it just came out today)

Change https://golang.org/cl/172018 mentions this issue: math/big: implement recursive algorithm for division

I can think of a O(log(log(n))) best case time complexity algorithm for computing exponents, which means Binet's formula might be doable for fibonacci(n).

Sorry to bring this up again
~Milind

Change https://golang.org/cl/266201 mentions this issue: math/big: implement Schönhage–Strassen fft algorithm

Was this page helpful?
0 / 5 - 0 ratings