Go: math/big: large fibonacci calculation is faster in GMP

Created on 20 Mar 2019 · 21Comments · Source: golang/go

What version of Go are you using (`go version`)?

go1.12.1 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output



set GOARCH=amd64

set GOBIN=

set GOCACHE=C:\Users\PhilipAppData\Local\go-build

set GOEXE=.exe

set GOFLAGS=

set GOHOSTARCH=amd64

set GOHOSTOS=windows

set GOOS=windows

set GOPATH=C:\Users\Philip\go

set GOPROXY=

set GORACE=

set GOROOT=C:\Go

set GOTMPDIR=

set GOTOOLDIR=C:\Gopkg\tool\windows_amd64

set GCCGO=gccgo

set CC=gcc

set CXX=g++

set CGO_ENABLED=1

set GOMOD=

set CGO_CFLAGS=-g -O2

set CGO_CPPFLAGS=

set CGO_CXXFLAGS=-g -O2

set CGO_FFLAGS=-g -O2

set CGO_LDFLAGS=-g -O2

set PKG_CONFIG=pkg-config

set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\PhilipAppData\Local\Temp\go-build593709294=/tmp/go-build -gno-record-gcc-switches

What did you do?

I wrote a Fibonacci calculator in Go using the math/big library. It was ~instant in run time up until n=10^7, after which it was extremely slow. I then used https://github.com/ncw/gmp to replace math/big with a gmp wrapper. No code was changed.

What did you expect to see?

A small performance benefit, maybe 2x or 3x.

What did you see instead?

With the same set of code, gmp quickly calculates n=10^8 in 5 seconds, while the normal math/big library takes more than 5 minutes. Gmp is even able to calculate n=10^9 after 1 minute.

NeedsInvestigation Performance

Source

Borthralla

👍2

Most helpful comment

For faster asymptotic multiplication, @remyoudompheng has https://github.com/remyoudompheng/bigfft which is much faster than math/big and only ~2x slower than gmp for very large numbers. Maybe he would like to contribute that implementation to math/big.

Also, for printing large integers @crvv noted above see https://github.com/golang/go/issues/20906

bmkessler on 20 Mar 2019

❤2 👍2

All 21 comments

Here is the fibonacci implementation:

func fib(n int) *big.Int {

    left := big.NewInt(1)
    right := big.NewInt(1)

    helper1 := big.NewInt(0)
    helper2 := big.NewInt(0)


    bin := fmt.Sprintf("%b", n)
    for i := 1; i < len(bin); i++ {

        helper1.Mul(big.NewInt(2), right)
        helper1.Sub(helper1, left)
        helper1.Mul(left, helper1)

        helper2.Mul(right, right)
        left.Mul(left, left)
        helper2.Add(helper2, left)

        if (bin[i] == '0') {
            left.Set(helper1)
            right.Set(helper2)
        } else {
            left.Set(helper2)
            right.Add(helper1, helper2)
        }
    }

    return left
}

Using math/big n=3*10^7:

     flat  flat%   sum%        cum   cum%
    31.93s 54.18% 54.18%     31.93s 54.18%  math/big.mulAddVWW
    23.17s 39.32% 93.50%     23.17s 39.32%  math/big.subVV
     2.34s  3.97% 97.47%      2.34s  3.97%  math/big.addMulVVW
     0.31s  0.53% 98.00%      0.31s  0.53%  runtime.cgocall
     0.15s  0.25% 98.25%      1.79s  3.04%  math/big.basicSqr
     0.14s  0.24% 98.49%     55.11s 93.52%  math/big.nat.divLarge
     0.13s  0.22% 98.71%      0.99s  1.68%  math/big.basicMul
     0.02s 0.034% 98.74%      0.38s  0.64%  internal/poll.(*FD).writeConsole
     0.01s 0.017% 98.76%      1.19s  2.02%  math/big.karatsuba
         0     0% 98.76%     55.71s 94.54%  fmt.(*pp).doPrintf

Using https://github.com/ncw/gmp at n=3*10^7 (same code):

     flat  flat%   sum%        cum   cum%
     1.07s 89.92% 89.92%      1.07s 89.92%  runtime.cgocall
     0.04s  3.36% 93.28%      0.04s  3.36%  runtime.memmove
     0.02s  1.68% 94.96%      0.02s  1.68%  runtime.stdcall2
     0.02s  1.68% 96.64%      0.02s  1.68%  unicode/utf16.Encode
     0.01s  0.84% 97.48%      0.01s  0.84%  indexbytebody
     0.01s  0.84% 98.32%      0.01s  0.84%  runtime.makemap_small
     0.01s  0.84% 99.16%      0.01s  0.84%  runtime.stdcall1
     0.01s  0.84%   100%      0.01s  0.84%  runtime.stdcall3
         0     0%   100%      0.53s 44.54%  fmt.(*pp).doPrintf
         0     0%   100%      0.53s 44.54%  fmt.(*pp).handleMethods

Using https://github.com/ncw/gmp at n=10^9 (same code):

flat  flat%   sum%        cum   cum%
    50.01s 91.81% 91.81%     50.01s 91.81%  runtime.cgocall
     1.14s  2.09% 93.90%      1.14s  2.09%  runtime.memmove
     0.54s  0.99% 94.90%     10.37s 19.04%  internal/poll.(*FD).writeConsole
     0.50s  0.92% 95.81%      0.50s  0.92%  runtime.procyield
     0.48s  0.88% 96.70%      0.60s  1.10%  unicode/utf16.Encode
     0.34s  0.62% 97.32%      0.34s  0.62%  runtime.memclrNoHeapPointers
     0.29s  0.53% 97.85%      0.29s  0.53%  unicode/utf8.DecodeRune
     0.01s 0.018% 97.87%      1.17s  2.15%  runtime.systemstack
         0     0% 97.87%     34.33s 63.03%  fmt.(*pp).doPrintf
         0     0% 97.87%     34.33s 63.03%  fmt.(*pp).handleMethods

Borthralla on 20 Mar 2019

gmp uses CGo. math/big is pure Go.

And according to their site https://gmplib.org/

GMP is carefully designed to be as fast as possible, both for small operands and for huge operands. The speed is achieved by using fullwords as the basic arithmetic type, by using fast algorithms, with highly optimised assembly code for the most common inner loops for a lot of CPUs, and by a general emphasis on speed.

It is expected that highly optimized C code will be faster than Go.

agnivade on 20 Mar 2019

0.14s  0.24% 98.49%     55.11s 93.52%  math/big.nat.divLarge

divLarge is slow.
Please see https://github.com/golang/go/issues/21960.

crvv on 20 Mar 2019

The code doesn't use divLarge, so the pprof result posted previously includes (*big.Int).String().

Without (*big.Int).String(), for n = 1e8, math/big uses 32 s and gmp uses 1.6 s on my machine.

The time complexity of math/big is about n**(log(3)/log(2)), which is the complexity of karatsuba algorithm.
For gmp, it is is about n**(log(2.3)/log(2))

The pprof result of math/big is

      flat  flat%   sum%        cum   cum%
    17.27s 59.80% 59.80%     17.27s 59.80%  math/big.addMulVVW
     1.88s  6.51% 66.31%      1.88s  6.51%  runtime.memmove
     1.87s  6.48% 72.78%      1.87s  6.48%  math/big.subVV
     1.51s  5.23% 78.01%      1.51s  5.23%  runtime.madvise
     1.38s  4.78% 82.79%     12.85s 44.49%  math/big.basicSqr
     1.34s  4.64% 87.43%      1.34s  4.64%  math/big.addVV
     1.09s  3.77% 91.20%      9.13s 31.61%  math/big.basicMul
     0.74s  2.56% 93.77%      0.74s  2.56%  runtime.memclrNoHeapPointers
     0.31s  1.07% 94.84%      0.31s  1.07%  math/big.shlVU
     0.24s  0.83% 95.67%      0.69s  2.39%  runtime.stkbucket

crvv on 20 Mar 2019

Though I am not yet a Go developer, I am tempted to chime in, since I have implemented this in Python3.

This is not how I would find the nth Fibonacci Number. Instead use the direct formula using the Golden Ration (Phi) to achieve results in O(1) time!

I am uploading the code to a Public repository soon, will share link once done.

Thx
Milind

thombrem on 20 Mar 2019

I believe the correct term to Google would be Binet's Formula... and here it is..

F(n) = round( Phi^n / √5 )

This is yet another reason to know good math and not just code!

Regards
Milind

thombrem on 20 Mar 2019

I believe the correct term to Google would be _Binet's Formula_... and here it is..

F(n) = round( Phi^n / √5 )

This is yet another reason to know good math and not just code!

Regards
Milind

While the fibonacci implementation is not the focus of this thread, it is clear that you have some fundamental misunderstandings about floating point as well as big integer arithmetic. Floating point numbers are only o(1) multiplication because they have finite precision. Big Integers, on the other hand, have arbitrary precision. In particular, the size we are referring to in this thread are in the millions of digits.
At the level of precision necessary to get an exact answer, you would need extremely precise floating point values which would take much longer to multiply. Additionally, assuming you are using exponentiation by squaring and not naive exponentiation, it is the same number of multiplications (log(n)) as in the function I posted.
My implementation is about as fast as the native gmp implementation of fibonacci, which means that it doesn't really get much faster.
If you're in doubt, try plugging in n=10000000 to that function and see if you get the correct 2-million digit answer.

Borthralla on 20 Mar 2019

@Borthralla Thanks for the insight! I do have a misunderstanding about the difference between floats and Big Integers. I will read this up in a day or two.

On a side note, this is yet another reason for me to be on this forum, thanks @Borthralla.

AND Exponentiation does take O(n) you are right yet again!

However, also note that there may be a great benefit in writing this in a single line of code as well, esp for (smaller) integer numbers.

thombrem on 20 Mar 2019

The code doesn't use divLarge, so the pprof result posted previously includes (*big.Int).String().

Without (*big.Int).String(), for n = 1e8, math/big uses 32 s and gmp uses 1.6 s on my machine.

The time complexity of math/big is about n**(log(3)/log(2)), which is the complexity of karatsuba algorithm.
For gmp, it is is about n**(log(2.3)/log(2))

The pprof result of math/big is
      flat  flat%   sum%        cum   cum%
    17.27s 59.80% 59.80%     17.27s 59.80%  math/big.addMulVVW
     1.88s  6.51% 66.31%      1.88s  6.51%  runtime.memmove
     1.87s  6.48% 72.78%      1.87s  6.48%  math/big.subVV
     1.51s  5.23% 78.01%      1.51s  5.23%  runtime.madvise
     1.38s  4.78% 82.79%     12.85s 44.49%  math/big.basicSqr
     1.34s  4.64% 87.43%      1.34s  4.64%  math/big.addVV
     1.09s  3.77% 91.20%      9.13s 31.61%  math/big.basicMul
     0.74s  2.56% 93.77%      0.74s  2.56%  runtime.memclrNoHeapPointers
     0.31s  1.07% 94.84%      0.31s  1.07%  math/big.shlVU
     0.24s  0.83% 95.67%      0.69s  2.39%  runtime.stkbucket

@crvv Thanks for the explanation. Are there plans to support faster operations for extremely large integers in math/big similar to gmp? Is gmp's method prohibitively complex?

Borthralla on 20 Mar 2019

cc @griesemer @bmkessler

Do you know offhand whether gmp uses karatsuba or a different algorithm for this?

josharian on 20 Mar 2019

@josharian

Definitely not. gmp stops using karatsuba pretty early:

NxN limb multiplications and squares are done using one of seven algorithms, as the size N increases.

Source: https://gmplib.org/manual/Multiplication-Algorithms.html#Multiplication-Algorithms

ALTree on 20 Mar 2019

The thresholds are calibrated by a calibration procedure, but for reference MUL_TOOM33_THRESHOLD (the point where they stop using karatsuba) is pretty low. Like in the order of a 100 limbs (= machine words): https://gmplib.org/repo/gmp/file/tip/mpn/x86/coreisbr/gmp-mparam.h#l56

ALTree on 20 Mar 2019

@josharian GMP uses schoolbook -> Toom-Cook -> Schönhage-Strassen for multiplication, depending on the input size.

ericlagergren on 20 Mar 2019

Does math/big include an implementation of Schönhage-Strassen?

Borthralla on 20 Mar 2019

No, it uses karatsuba for large numbers.

ericlagergren on 20 Mar 2019

Also, for printing large integers @crvv noted above see https://github.com/golang/go/issues/20906

bmkessler on 20 Mar 2019

❤2 👍2

Most observations have already been made here. To summarize:

We know that certain low-level ops could be faster (e.g., addMulVVW)
math/big only implements Karatsuba. It should use the various Toom versions and FFT for very large numbers.
Division is slow (for output).

One should probably start with the Tooms.

griesemer on 25 Mar 2019

Good new, guys! They found an nlog(n) multiplication algorithm, we should just use that!
https://hal.archives-ouvertes.fr/hal-02070778/document
(Mostly joking, but the algorithm is real and it just came out today)

Borthralla on 26 Mar 2019

😄2

Change https://golang.org/cl/172018 mentions this issue: math/big: implement recursive algorithm for division

gopherbot on 14 Apr 2019

I can think of a O(log(log(n))) best case time complexity algorithm for computing exponents, which means Binet's formula might be doable for fibonacci(n).

Sorry to bring this up again
~Milind

thombrem on 16 Apr 2019

Change https://golang.org/cl/266201 mentions this issue: math/big: implement Schönhage–Strassen fft algorithm

gopherbot on 29 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

cmd/go: progress bar for "go get"

longzhizhi · 3Comments

x/build/cmd/coordinator: ssh proxy should support scp

bradfitz · 3Comments

cmd/go: cross compile failed without useful error message

mingrammer · 3Comments

cmd/go: go get extremely slow with no feedback by default

Miserlou · 3Comments

proposal: sync: Map.Delete method should return bool, indicating if key was deleted or not

lkarlslund · 3Comments

Go: math/big: large fibonacci calculation is faster in GMP

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 21 comments

Related issues

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?