Go: proposal: Go 2: hash: make the Sum method less confusing

Created on 18 Jul 2017  路  7Comments  路  Source: golang/go

The Go 1 hash.Hash interface has a Sum method:

    // Sum appends the current hash to b and returns the resulting slice.
    // It does not change the underlying hash state.
    Sum(b []byte) []byte

Individual packages (such as md5) contain Sum functions with very similar signatures but totally different meaning:

  // Sum returns the MD5 checksum of the data.
  func Sum(data []byte) [Size]byte

This makes (hash.Hash).Sum confusing, as illustrated in the review for http://golang.org/cl/49030.

I can see two possible improvements.

  1. We could rename the method to AppendSum (along the lines of the strconv Append functions). Adding such a method to the implementations (but not the Hash interface itself) would be backward-compatible with Go 1.
    // AppendSum appends the current hash to b and returns the resulting slice.
    // It does not change the underlying hash state.
    AppendSum(b []byte) []byte
  1. We could change Sum to not accept a parameter and always return a new slice. If we pair that with appropriate inlining and devirtualization optimizations, it could theoretically be as efficient as appending to an existing slice.
    // Sum returns a new slice containing the current hash.
    // It does not change the underlying hash state.
    Sum() []byte

I recommend option (1), because I think it would migrate more smoothly in binaries that mix Go 1 and Go 2.

Go2 Go2Cleanup Proposal

Most helpful comment

always return a new slice. If we pair that with appropriate inlining and devirtualization optimizations, it could theoretically be as efficient as appending to an existing slice.

I don't see that happening anytime soon. I think AppendSum and being explicit about memory is still best until we've proven we can make a compiler and/or GC smart enough to not worry about allocation costs.

All 7 comments

always return a new slice. If we pair that with appropriate inlining and devirtualization optimizations, it could theoretically be as efficient as appending to an existing slice.

I don't see that happening anytime soon. I think AppendSum and being explicit about memory is still best until we've proven we can make a compiler and/or GC smart enough to not worry about allocation costs.

I agree that renaming the Sum method to AppendSum seems like an appropriate change.

In go 1, we could rename the argument b to something more explicit like appendTo:

Sum(appendTo []byte) []byte

This is just a cosmetic, backwards compatible change but it can help avoid the confusion.

This may not be worth the pain. We'd need to make a new v2 hash package, and then either make v2s of all implementations, or add an additional AppendSum methods to all types alongside their old Sum methods to implement the old [v1] hash package. Both seem kinda gross.

Why does the method need to be called AppendSum? What other Append operations could it possibly be referring to? If have an Appender interface type with the signature Append([]byte) []byte, naming the method with the extra suffix AppendSum prevents the standard library from accruing implicitly satisfied Appender types.

The original name Sum should have just been Append. The documentation for the method set seems to agree.

@bradfitz Because of the "pain" you mention, I believe if this might ever be done, "Go2" is the only point where it could be. Please don't casually dismiss it too easily. Please note there's also a huge pain in the current state of events. Because of this very issue, I personally have a huge yellow virtual "WARNING" sign attached to the whole hash package in my mind, with the small print saying: "never touch the hash package without cautiously reading the godoc; there's some well-known minefield somewhere there". And I'm certainly not a noob to Go, being in the community since the 2009. Notably, I do also believe that the current shape of the interface can be unnecessarily risky with regards to use in security-related applications (there's a reason why many Hash implementations are located in the crypto package), and in this domain I believe it's generally better to be safe than sorry, including in API design. For the sake of making the virtual world somewhat safer.

I understand this may not be a "prime time" candidate for the "Go2" process. But once some of the necessary Go2 refactoring approaches/mechanisms get tested and proven on some more "attractive" packages/issues, and people get used to them in the Go 2 transition period, I don't see why the "smaller papercuts" issues like this one cannot get pulled onto the bandwagon as well.

Any plan? it's highly useful to minimize memory overhead when the go program will do the sum

BTW: Append API naming style is idiomatic:)

Was this page helpful?
0 / 5 - 0 ratings