Node: Base2 Encoding for Buffers

Created on 14 May 2016  路  8Comments  路  Source: nodejs/node

I'm looking to work with arbitrary length binary data in node,001000100100 <- stuff like this. Buffers seem like the proper place for this but currently the 'binary' encoding on buffers is for latin-1 strings, not truly binary data. As currently stands I'm looking at using the 'hex' encoding and manually converting and then adding padding to my data but this is less than ideal.

I propose that a new encoding 'base2' be added to buffers in order to allow easier manipulation at the bit level. This would be non-breaking and thus could be added to the next release.

I am of the view that in the long run the 'binary' encoding should be changed to be truly binary and 'latin-1' added as a separate encoding. Then again there's probably a great historical reason as to why things are the way they are.

buffer feature request

Most helpful comment

How would adding a base2 encoding allow for easier _manipulation_?

IMHO though this functionality is better suited for userland, especially since this is the first time I've heard of this kind of request and I can't imagine many people would be using this (for real work).

All 8 comments

How would adding a base2 encoding allow for easier _manipulation_?

IMHO though this functionality is better suited for userland, especially since this is the first time I've heard of this kind of request and I can't imagine many people would be using this (for real work).

A simplified version of my use case involves wanting to edit the nth bit depending on various parameters. Using the hex output from a buffer (hex is 4 bits) the bit I want to edit could correspond to the 1st, 2nd, 3rd or 4th bit. To then flip its values I would either need a fairly complex conversion table or to convert to binary, do the manipulation and then convert back again.

The reason why I believe this belongs in node rather than in the userland is coherence. Buffers are presented as a way to deal with/manipulate binary data. A naive user, such as myself, intuitively thinks that the binary encoding would get them 0s and 1s. Upon realising that isn't what binary is for, and reading the documentation, I expect there to be another encoding to suit my needs. After all the class is all about binary data. It's sort of confusing as to why hex (base16) and base64 exist but not a base2.

I know in the above use binary refers more to a bitwise match than to 0/1s but frankly not including it seems incomplete.

converting to a base2 string would be a horribly inefficient way of doing individual bit manipulation on the buffer data. The algorithm for flipping the n-th bit is fairly straightforward and would be fairly simple to implement without the costly conversion to base2.

It's sort of confusing as to why hex (base16) and base64 exist but not a base2

The reason is that they (base 16 and base 64) are significantly more common in the real world.

@Crazometer If you want to set bit n in a Buffer, here's a simple way (assuming little endian) to do it without any conversions:

var n = 32;
// Calculate the "bucket" in the Buffer that the bit resides in
var index = Math.floor(n / 8);
// Calculate the offset of the bit within that "bucket"
var bitOffset = n - index * 8;
// Set the bit
buffer[index] |= 1 << bitOffset;
// Unset the bit
buffer[index] &= ~1 << bitOffset;

You can easily put this in a function (or separate functions for setting/unsetting) and reuse it.

@jasnell Is the main source of the inefficiency in the length of the base2 string?

@mscdex Thanks for that, that's fairly clever straightforward and for some reason I always forget about the bitwise operators.

The other use case I have that would benefit from a base2 encoding involves concatenating a series of base2 strings and relatively short buffers together. Approximately it looks something like 01 + ABD123 + 10 + FFFACD ...

The original intention was to convert it all to binary and then pad it so it fits into a buffer. Using your method above I suppose I could preallocate a buffer then convert it bit by bit. This still feels a little off to me.

This seems like an XY problem.

I'm trying to reduce the overhead in a transmission format we use where there are many small payloads and a large amount of formatting.

@Crazometer ... based on what I can gather from your second case, I believe you're still best served by bitwise operations than on base2 encoding. Using base2 encoding for this would be extremely wasteful and there would be absolutely no way to justify getting it into core. If you really did want to go that route, however, creating a function that performs base2 encoding/decoding would not be too difficult.

Given that it's extremely unlikely that base2 would ever be supported natively in core, I'm going to go ahead and close this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

srl295 picture srl295  路  3Comments

danielstaleiny picture danielstaleiny  路  3Comments

sandeepks1 picture sandeepks1  路  3Comments

vsemozhetbyt picture vsemozhetbyt  路  3Comments

stevenvachon picture stevenvachon  路  3Comments