Go: proposal: support bool type in encoding/binary

Created on 24 Aug 2016  路  24Comments  路  Source: golang/go

The only fixed-size built-in type that's not supported currently is bool. Since it's arguably one of the more commonly used data types, I would propose its inclusion in encoding/binary.

The bool encoding would be equivalent to uint8 where zero maps to false and non-zero (1, usually) maps to true.

FrozenDueToAge Proposal

Most helpful comment

I support this proposal, and agree with what @randall77 said.

@blixt, thanks for gathering data.

@adg, @ianlancetaylor, @griesemer, thoughts?

All 24 comments

Note that if we admit all non-zero as valid "true", that we need to normalize that to 1 on the go side. Easily accomplished with b := false; if u8 != 0 then {b = true} but anyone planning sneaky optimizations needs to be aware of this.

I'm not convinced that we should add bool. Different systems use different
encoding for boolean values (e.g. 1 vs. -1 for true)

It's easy to convert Uint8 from/to a bool or write your own function for
booleans. Therefore, I'm not sure we should add bool to encoding/binary.

While there are definitely several ways to encode bool (sometimes even 32-bit integers for in-memory representation), it's pretty reasonable to assume that it should only use one byte. I have a few arguments why I think it is reasonable:

  1. I'm making the assumption that encoding/binary is primarily for storage and transmission of data, as opposed to memory layout. One byte representation of bool is almost exclusively the case for storage and transmission purposes, and it can't be less than one byte (at least not in Go as it'd break the io.Reader and io.Writer interfaces), nor is there any purpose to making it more bytes. Ordinarily the definition of bool is zero and non-zero for false and true. As @dr2chase points out that may add some complexity to the encoding/decoding process, but not unreasonably so. I also think it'd be fair to go with the most common scenario, where it's only zero and one for false and true.
  2. Assuming that bool would be most commonly used in structs as opposed to on its own or in slices, it's difficult to accommodate the proposed conversion between uint8 and bool without transferring all the fields to another type. The reason being that encoding/binary does not allow unexported or unused fields on the struct. In my personal experience with the package, I've had to export structs with uint8 fields that are then documented with "1 for true, 0 for false" at the enduser's expense.
  3. There is previous work in the encoding/binary package that sets a precedent for choosing one design even though there may be several. I'm talking about Varint which actually uses a proprietary* format instead of any of the commonly available ones such as VLQ. I bring this up because it makes the case that a similar decision could be made for bool.
  4. If a byte representation of bool can truly not be established, the only remaining option (besides the transfer of struct fields I mentioned earlier) would be to either have a binary.Bool type which would have a predetermined binary structure, or to allow an interface for custom encoding, similar to the json package.

*) I say "proprietary" because it is specific to the Golang codebase and does not (to my knowledge) adopt any publicly available standard. You may say it's the Google Protocol Buffers implementation, but based on a design comment in the source it may be incompatible since protobuf uses 128-bit varints.

Not commenting on the larger issue (should this be added), but I'm not concerned at all about using 0=false and 1=true. If you're talking to a system that doesn't use 1=true, then don't use encoding/binary. Or use uint8s. We should use the obvious answer, and there's an easy workaround for alternate systems.

I think that would be a fair decision to make. It'll probably cover the vast majority of cases as it is almost universal to use a byte with value 0/1 to represent bool. Here's a few examples of standard libraries using that particular encoding format:

I've built an example implementation of bool support to make the discussion around it easier. Note that this implementation considers zero false and non-zero true when decoding, just like the aforementioned standard libraries. I could not see any possible performance or LoC improvement to restricting the values to 0 and 1 (in fact, additional error checking would be required).

See the referenced commit (https://github.com/blixt/go/commit/39e39df093a57fc698a224c3b4eebb8dc6477851).

Excluding non-{0,1} bytes from being boolean values would enable the would-be-nice optimization of just copying bytes wholesale. We might be able to do that either if the compiler recognized encoding/binary.{Read,Write} (this proposal was not favorably received) or if the interface were extended to allow "compiled" binary.Readers and binary.Writers for a given type and endianness (where "compilation" would verify that the reflected type had the same layout as the data, and in that case the "compilation" would just be a byte-blit.

If any non-zero values gets to be "true" in the incoming bits and that becomes common practice, then that optimization will never happen for types that include booleans.

That may be outside the scope of the encoding/binary package, simply based on the prelude in the header:

This package favors simplicity over efficiency. Clients that require
high-performance serialization, especially for large data structures,
should look at more advanced solutions such as the encoding/gob
package or protocol buffers.

I still think it's fair to restrict it to 0 and 1, but if this package is mostly used for disk/network binary then avoiding that restriction would make it equivalent to decoding in many other standard libraries. While I don't think that it'll make a big difference in practice, the non-zero behavior would arguably be less surprising.

I actually sent in a CL that added this years ago, but it was rejected. I had thought it would be useful for a project I was working on at the time, but by the time it had been rejected, I had already worked around it, so I didn't really try to argue. Unfortunately, I can't seem to find the CL anywhere.

As we wait for feedback to this proposal, I took some time to do some research on whether this proposal would be useful to existing users of encoding/binary out there.

This is not extensive but I started going through the uses of binary.Read list on Sourcegraph from top to bottom checking the popular repos (> 75 stars) for any references to booleans.

| Repository | Stars | Converts byte != 0 to bool? |
| --- | --: | --- |
| golang/go | 19,683 | No |
| hybridgroup/gobot | 2,278 | Yes |
| streadway/amqp | 933 | Yes |
| vishvananda/netlink | 216 | No |
| GoBelieveIO/im_service | 170 | No |
| zeromq/zproto | 114 | No |
| istreamdata/orientgo | 98 | Yes |
| mitchellh/go-vnc | 93 | Yes |

So far the conclusion is that the need for bool is not uncommon and several of these repos even have additional functions to add on bool reading/writing. Almost all of the repos that don't use bool with encoding/binary are generally only relying on the big/small endian code in the package, and don't use anything beyond the (u)int16/32/64 coding.

I support this proposal, and agree with what @randall77 said.

@blixt, thanks for gathering data.

@adg, @ianlancetaylor, @griesemer, thoughts?

I'm fine with this but I'm inclined to think that the Bool method should only accept 0 or 1, meaning that, unlike the other decode methods, it will have to return an error value.

I don't think it should be included precisely because
people disagree about what value(s) corresponds to
true.

@minux Is there really disagreement, or just hypothetical disagreement? Does anything really think the values should be anything other than 0 and 1?

Let me just reiterate that:

  1. Most if not all of the major languages that have a standard library for encoding/decoding binary streams went with the byte != 0 strategy for booleans
  2. I've not found a single example of a Go codebase that does binary<=>bool which doesn't use the byte != 0 strategy (except the ones that only write which technically only use the byte == 1 strategy)

I understand the theoretical issue but I can't find a single practical example in Go or in other programming languages giving it credence.

Finally, the encoding/binary package specifically "favors simplicity over efficiency", and I would argue that being able to use bool in a struct is a big win for simplicity.

To be clear, I don't think we should require either 0 or 1 on decode for efficiency. I think we should do it for safety. If you want to decode a bool, you should be looking at the encoding of a bool.

I think if we add support bool, it should:

  1. read 0 as false, non-zero as true,
  2. write 1 for true, and 0 for false.

But apparently, Ian disagree with point 1. However Ian's
view is reasonable too. Namely, if you only read what you
write, and this will help detect cases where the stream is
out of sync.

All the other supported types have the one-to-one property
wrt. encoding/decoding, but bool doesn't have that property,
which opens the door for such discussions as whether we
should allow 2 as true or not.

To summarize the last few comments:

  • it's fair to say no library encodes true as anything but 1
  • it's fair to say most (if not all) deserializers assume byte != 0 is true (probably because it's less code to implement)
  • assuming byte == 1 is true would be more correct and potentially catch out-of-sync errors
  • adding an error for boolean values that are not 0 or 1 would add a new type of error behavior to binary.Read (one that is based on the decoded value and not the state of the io.Reader or the provided data structure, which means it would discard a byte from the input stream)

Practically, this is mostly minutiae (due to the first point), but if we want to defer the decision between the two (in the interest of implementing this proposal), then going for correctness first and making it more lenient later could be an acceptable compromise.

However, if no one has a strong opinion on one over the other, I would suggest the less correct one (byte != 0) simply to avoid the additional complexity of the error state, leaving out-of-sync detection up to the user (as is currently the case with all the other value types).

CL https://golang.org/cl/28514 mentions this issue.

@adg, @ianlancetaylor, decision on this proposal?

I'm in favor. As I said above, I think decode should only accept 0 or 1. But I can live with either approach.

I've rebased my change, which uses behavior byte != 0. Please review when possible, there are also tests included.

While I recognize that it'd be easier to catch errors by only allowing 0 or 1, it'd open up the possibility of even more obscure bugs because an erroneous byte value would be discarded, leaving the reader in a potentially irrecoverable state. If no one feels strongly about this I'd like to leave it as-is.

CL https://golang.org/cl/33756 mentions this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bradfitz picture bradfitz  路  3Comments

enoodle picture enoodle  路  3Comments

natefinch picture natefinch  路  3Comments

ashb picture ashb  路  3Comments

stub42 picture stub42  路  3Comments