Sarama Version: e8552c0 (latest master)
Kafka Version: 1.0.0
Go Version: 1.10
Standard Configuration.
No specific logs.
https://github.com/Shopify/sarama/blob/e8552c0dbf502cd7ec9186b57897e6cf9f15fb34/partitioner.go#L126
The p.hasher.Sum32() is being cast to int32. The default hasher sarama uses is FNV-1. In golang, Sum32() for FNV-1 returns uint32, and it can overflow if cast to int32.
In my case, I'm using custom hash (xxHash32) and with the key="123", the hash value="3062191159" which overflows int32. It would be nice to not have the p.hasher.Sum32() cast to int32.
It's not clear to me how this is a problem? The partition chosen is still valid, consistent, and well-distributed.
The problem is the result is different when integrating with software written in other languages. We have a program that is written in Rust and using Rust Kafka Crate. They are using uint32 for Sum32, so the result is different comparing with golang Sarama.
By casting to signed int32 and making the result non-negative, you are essentially changing the hash algorithm, which is a problem when expecting a hash partitioner to behave the same way across the implementations.
Ah, this is true... hmm.
The main problem with fixing it is that it's basically a breaking change; we need to maintain the consistency of the hash function between versions for existing users. I think the best we can do is implement a NewUnsignedHashPartitioner or something which people can manually configure.
If we ever do a sarama 2.0 with breaking changes we could fix it for the default at that point. Added to https://github.com/Shopify/sarama/wiki/Ideas-that-will-break-backwards-compatibility
Just to follow up: we've already had to adjust this once for different implementations of the absolute-value function (https://github.com/Shopify/sarama/commit/6967cdb516216f65c91b9ae11431901c8b12ecf5).
I would also note that the HashPartitioner code is fairly simple and the interface is public, so you're free to implement the entirely new partitioner behaviour yourself.
Thanks for the feedback. We are now using the entirely new custom hash partitioner in our project.
Most helpful comment
Ah, this is true... hmm.
The main problem with fixing it is that it's basically a breaking change; we need to maintain the consistency of the hash function between versions for existing users. I think the best we can do is implement a
NewUnsignedHashPartitioneror something which people can manually configure.If we ever do a sarama 2.0 with breaking changes we could fix it for the default at that point. Added to https://github.com/Shopify/sarama/wiki/Ideas-that-will-break-backwards-compatibility