The overhead of constructing a Hash object in order to call Hash.Write greatly diminishes it's utility for small buffers.
Consider the following micro-benchmark:
var source = []byte("hello, world") // 12 bytes long
var sink uint64
func BenchmarkMapHash(b *testing.B) {
var seed = maphash.MakeSeed()
for i := 0; i < b.N; i++ {
var h maphash.Hash
h.SetSeed(seed)
h.Write(source)
sink = h.Sum64()
}
}
//go:linkname runtime_memhash runtime.memhash
//go:noescape
func runtime_memhash(p unsafe.Pointer, seed, s uintptr) uintptr
func BenchmarkUnsafeHash(b *testing.B) {
for i := 0; i < b.N; i++ {
sink = uint64(runtime_memhash(*(*unsafe.Pointer)(unsafe.Pointer(&source)), 0, uintptr(len(source))))
}
}
On my machine this produces:
BenchmarkMapHash 56459700 21.6 ns/op
BenchmarkUnsafeHash 246443527 4.90 ns/op
Directly using runtime_memhash is around ~4x faster since it avoids all the unnecessary state contained by Hash that are relatively pointless when hashing a single small string.
I propose adding the following API:
func Sum(b []byte, seed Seed) uint64
where the function is a thin wrapper over runtime_memhash. It takes Seed as an input to force the user to think about the application of seeds.
I chose the name Sum to be consistent with md5.Sum or sha1.Sum.
I found hash/maphash was unusable for string hashing in a custom map implementation due to performance. It was better to use runtime.strhash directly (..and accept potential future maintenance hassle).
Ideally maphash.Sum([]byte(s), seed) would also be Fast, but this requires some compiler improvements. Maybe SumString should be added as well to address this use case?
Ideally
maphash.Sum([]byte(s), seed)would also be Fast, but this requires some compiler improvements. MaybeSumStringshould be added as well to address this use case?
See #2205. I didn't propose SumString cause I'm still hoping for a compiler optimization to avoid these cases.
for i := 0; i < b.N; i++ {
var h maphash.Hash
h.SetSeed(seed)
h.Write(source)
sink = h.Sum64()
}
This is not how it's supposed to be used. You're supposed to do:
var h maphash.Hash
h.SetSeed(seed)
for i := 0; i < b.N; i++ {
h.Reset()
h.Write(source)
sink = h.Sum64()
}
Can you try that instead?
Can you try that instead?
The runtime drops from ~22ns to 20ns, but it's still much slower compared to a direct hash on a []byte (~5ns).
@dsnet, is there a fundamental reason why Write+Sum64 should be so much slower than a single call?
Back when SetSeed was in the loop, I could see SetSeed being fundamentally unnecessarily expensive.
But why is Write+Sum64 fundamentally slower than one call? Do we just have a performance bug somewhere?
I see several reasons for slowdown:
Hash.Write always performs a copy on the input slice. It's not clear to me how to avoid the copy without fundamentally changing the operation of how Hash works. For hashing short strings, half the time is spent on copying (i.e., avoiding the copy brings runtime to 10ns).Hash.Reset, Hash.Write, Hash.Sum64, and Hash.initSeed are not inlineable. With some massaging, we can probably get all of them except Hash.Write to be inlined. Doing so saves a few more ns.I see. So the copy during the large writes is a bug - we should just hash those bytes directly. Then the benchmark, which hashes a multiple of 64 bytes, would have no copies at all. At that point it seems like the only win is Write+Sum64 combined knowing that the final copy isn't needed. But maybe that's too small a win.
Change https://golang.org/cl/278758 mentions this issue: hash/maphash: optimize Write and WriteString
Change https://golang.org/cl/278759 mentions this issue: hash/maphash: manually inline setSeed
I played with this a bit, because I also hit this recently. CLs 278758 and 278759 and 278760 were what I could squeeze out of it. They help a fair amount for large writes, but not much for little ones, which is what I'm working on (trying to reduce the performance regression due to https://github.com/tailscale/tailscale/commit/aa9d7f466550b567e11c317f64ea5eac76d31314).
Change https://golang.org/cl/278760 mentions this issue: hash/maphash: increase the buffer size