Go: proposal: a faster, better fnv32 implementation

Created on 30 Mar 2017  路  16Comments  路  Source: golang/go

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.7.3 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/orcaman/github.com/orcaman/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lc/lr_hrpk55218jm8dn3s78ww00000gn/T/go-build334927550=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

At the concurrent-map project we have written a faster, better implementation of function fnv32.

Please have a look at this issue to see the implementation and some benchmarks showing the performance improvement: https://github.com/orcaman/concurrent-map/issues/48

This is the commit where we added our custom implementation:
https://github.com/orcaman/concurrent-map/commit/e3b11eb3d69dd6a869c3a701c43baa0658bc179f

We clearly see on multiple machines of different kinds that our implementation is significantly faster. I would like to contribute this change to the go source tree.

Cheers,
Or

FrozenDueToAge Proposal

Most helpful comment

Actually, https://go-review.googlesource.com/c/37751/ landed this cycle

@orcaman, could you profile Go 1.8 vs Go tip for the standard fnv package?

All 16 comments

The core implementation in std looks essentially the same as yours:

func (s *sum32) Write(data []byte) (int, error) {
    hash := *s
    for _, c := range data {
        hash *= prime32
        hash ^= sum32(c)
    }
    *s = hash
    return len(data), nil
}

(sum32 is a cast).

I believe the reason your function is faster is that your implementation does not conform to the Hash interface and for this reason the runtime has less work to do when you use it.

EDIT: this is actually already noted in the issue you linked. The point is that I don't think we can change the Hash package now, moving to pure functions and whatnot. That would break the go1 compatibility promise.

Thanks for your response.

Indeed it seems that the reason for the faster implementation on our project is due to the fact the runtime has less work to do because we are not using the Hash interface.

The thing is: our benchmarks show a very big difference as you can see in the issue. Our version was in the range of 90.8 ns/op - 95.6 ns/op vs 165 ns/op - 170 ns/op if we re-use the hasher (or 214 ns/op vs 249 ns/op if we don't!).

I think that as this level it's worth giving people the option to use the faster version with native go.

I understand the motivation but I don't see a reasonable way to do this.

As I wrote we can't change the current API. So we would have to add a second package, with the same functionality, but implemented using plain functions, or just dump a bunch of new functions in the current package, duplicating everything there's now.

Both solutions are IMHO undesirable.

Well, I'm religious about beautiful and idiomatic code as much as the next gopher, but if it was my application, I would rather suffer something like an extra package for such a significant performance improvement. But I see your point, and most applications are probably not performance critical like ones that use concurrent-map.

crc32 has separate functions for doing the operation directly and not via some interface: https://golang.org/pkg/hash/crc32/#Checksum

Same with sha1 and md5: https://golang.org/pkg/crypto/sha1/#Sum ... https://golang.org/pkg/crypto/md5/#Sum

I don't think it'd be weird to have a dedicated function in the fnv package for this.

The Sha and Md5 packages are very small though. They have New for the hasher and then Sum, and that's it. fnv will need either 4 new exported Sum functions (for 32, 32a, 64, 64a) or a more complicated single function that takes another parameter (after the data) and switches on implementations. It'll be certainly more confusing to document than md5.

Still, that's a good point: since other packages do this, there's a precedent.

I'm not worried about 4 sum functions. It's a small and focused package. If somebody comes looking for fnv, they won't be overwhelmed by 4 fnv functions.

Thank you both for the feedback. Would it be reasonable at this stage if I will implement the new fnv32 function as a seperate function in the spirit of the other hashes mentioned here, and submit the contribution for code review as per the guidelines?

Is it possible for the compile to optimize out the Hash interface when the user just use one implement?

Actually, https://go-review.googlesource.com/c/37751/ landed this cycle

@orcaman, could you profile Go 1.8 vs Go tip for the standard fnv package?

@orcaman Also it would be great if you could benchmark each function multiple times using -count, like in go test -bench=. -count 10 > old.txt , and then generate a report with benchstat (https://godoc.org/golang.org/x/perf/cmd/benchstat).

@bradfitz @ALTree hey guys, I just ran a benchmark of go1.8 vs go tip (BTW, for go tip I just built go from the latest master, is this OK?).

Here are the results:

for go 1.8:

$ go test -bench=. -count 10 > go1_8.txt
$ cat go1_8.txt 
BenchmarkFNV32Custom-8          20000000            72.1 ns/op
BenchmarkFNV32Custom-8          20000000            71.8 ns/op
BenchmarkFNV32Custom-8          20000000            75.2 ns/op
BenchmarkFNV32Custom-8          20000000            73.2 ns/op
BenchmarkFNV32Custom-8          20000000            69.6 ns/op
BenchmarkFNV32Custom-8          20000000            75.6 ns/op
BenchmarkFNV32Custom-8          20000000            73.2 ns/op
BenchmarkFNV32Custom-8          20000000            70.2 ns/op
BenchmarkFNV32Custom-8          20000000            72.2 ns/op
BenchmarkFNV32Custom-8          20000000            75.2 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           179 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           145 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           146 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           145 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           146 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           145 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           147 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           153 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           150 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           149 ns/op
BenchmarkFNV32NativeReuse-8     10000000           137 ns/op
BenchmarkFNV32NativeReuse-8     10000000           136 ns/op
BenchmarkFNV32NativeReuse-8     10000000           135 ns/op
BenchmarkFNV32NativeReuse-8     10000000           133 ns/op
BenchmarkFNV32NativeReuse-8     10000000           135 ns/op
BenchmarkFNV32NativeReuse-8     10000000           137 ns/op
BenchmarkFNV32NativeReuse-8     10000000           138 ns/op
BenchmarkFNV32NativeReuse-8     10000000           132 ns/op
BenchmarkFNV32NativeReuse-8     10000000           133 ns/op
BenchmarkFNV32NativeReuse-8     10000000           135 ns/op
PASS
ok      github.com/orcaman/fnv-bench    73.664s

for go built from the latest master branch:

$ /usr/local/go_tip/go/bin/go test -bench=. -count 10 > go_tip.txt
$ cat go_tip.txt
goos: darwin
goarch: amd64
pkg: github.com/orcaman/fnv-bench
BenchmarkFNV32Custom-8          20000000            72.0 ns/op
BenchmarkFNV32Custom-8          20000000            70.2 ns/op
BenchmarkFNV32Custom-8          20000000            73.8 ns/op
BenchmarkFNV32Custom-8          20000000            72.1 ns/op
BenchmarkFNV32Custom-8          20000000            76.0 ns/op
BenchmarkFNV32Custom-8          20000000            77.7 ns/op
BenchmarkFNV32Custom-8          20000000            76.5 ns/op
BenchmarkFNV32Custom-8          20000000            74.3 ns/op
BenchmarkFNV32Custom-8          20000000            76.6 ns/op
BenchmarkFNV32Custom-8          20000000            75.3 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           185 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           154 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           151 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           154 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           153 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           150 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           152 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           149 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           152 ns/op
BenchmarkFNV32NativeAlloc-8     10000000           152 ns/op
BenchmarkFNV32NativeReuse-8     10000000           140 ns/op
BenchmarkFNV32NativeReuse-8     10000000           138 ns/op
BenchmarkFNV32NativeReuse-8     10000000           137 ns/op
BenchmarkFNV32NativeReuse-8     10000000           136 ns/op
BenchmarkFNV32NativeReuse-8     10000000           139 ns/op
BenchmarkFNV32NativeReuse-8     10000000           139 ns/op
BenchmarkFNV32NativeReuse-8     10000000           141 ns/op
BenchmarkFNV32NativeReuse-8     10000000           139 ns/op
BenchmarkFNV32NativeReuse-8     10000000           138 ns/op
BenchmarkFNV32NativeReuse-8     10000000           138 ns/op
PASS
ok      github.com/orcaman/fnv-bench    80.214s

For reference, here is my benchmarking code:

package fnvbench

import (
    "hash/fnv"
    "math/rand"
    "testing"
)

const (
    maxElems  = 10000000
    maxStrLen = 128
    letters   = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
)

// generate random string of given length
func randString(n int) string {
    b := make([]byte, n)
    for i := range b {
        b[i] = letters[rand.Int63()%int64(len(letters))]
    }
    return string(b)
}

// generate slice with maxElems strings of variable length: 0 - maxStrLen
func prepareData() *[]string {
    var strLen int
    stringArray := make([]string, maxElems)
    for i := 0; i < maxElems; i++ {
        strLen = rand.Intn(maxStrLen)
        stringArray[i] = randString(strLen)
    }
    return &stringArray
}

// custom FNV32 from concurrent-map package
func fnv32(key string) uint32 {
    hash := uint32(2166136261)
    const prime32 = uint32(16777619)
    for i := 0; i < len(key); i++ {
        hash *= prime32
        hash ^= uint32(key[i])
    }
    return hash
}

var dataSet = prepareData()

func BenchmarkFNV32Custom(b *testing.B) {
    for i := 0; i < b.N; i++ {
        fnv32((*dataSet)[i%maxElems])
    }
}

func BenchmarkFNV32NativeAlloc(b *testing.B) {
    for i := 0; i < b.N; i++ {
        hasher := fnv.New32()
        hasher.Write([]byte((*dataSet)[i%maxElems]))
        hasher.Sum32()
    }
}

func BenchmarkFNV32NativeReuse(b *testing.B) {
    hasher := fnv.New32()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        hasher.Reset()
        hasher.Write([]byte((*dataSet)[i%maxElems]))
        hasher.Sum32()
    }
}

And here are those same results presented using benchstat. Please use it in the future.

name                old time/op  new time/op  delta
FNV32Custom-8       72.8ns 卤 4%  74.5ns 卤 6%    ~     (p=0.118 n=10+10)
FNV32NativeAlloc-8   147ns 卤 4%   152ns 卤 2%  +3.09%  (p=0.003 n=9+9)
FNV32NativeReuse-8   135ns 卤 2%   138ns 卤 1%  +2.52%  (p=0.001 n=10+8)

Are you sure you're not comparing apples and oranges? Those string to byte slice conversions look like allocations in your inner loop to me.

@bradfitz seems you are right!

I changed the benchmarking code to prepare the data as slices of bytes instead of strings (see here) and the results are now quite comparable.

This is after running the new benchmark 10 times again:

$ benchstat go1_8.txt go_tip.txt 
name                old time/op  new time/op  delta
FNV32Custom-8       89.5ns 卤10%  83.0ns 卤 5%  -7.25%  (p=0.010 n=10+10)
FNV32NativeAlloc-8   105ns 卤 8%   110ns 卤 4%    ~     (p=0.108 n=10+10)
FNV32NativeReuse-8  91.0ns 卤10%  89.0ns 卤 6%    ~     (p=0.382 n=10+10)

Looks like there's nothing to do here, then.

You should run more iterations of the benchmark -count=... so the p value above is lower and says something. Right now they look comparable enough, though.

I'll close this but holler if you see room for improvement.

Was this page helpful?
0 / 5 - 0 ratings