Node: Feature proposal: buffer.convert(value[, fromEnc], toEnc)

Created on 25 Nov 2016 · 21Comments · Source: nodejs/node

While investigating the Buffer usage, I noticed one thing — in many (really, many) cases, Buffers are constructed just for the sake of base64-encoding a string.

base64-encodinng is used in many places, like auth headers, data uris, messages, etc.

With the new API, that would look like Buffer.from(from).toString(encoding).
With the old API, that would look like new Buffer(from).toString(encoding) (note that this lacks checks).

So I propose to add a small utility function, e.g.

Buffer.convert = function(value, targetEncoding, sourceEncoding) {
  if (!Buffer.isEncoding(targetEncoding)) {
    throw new TypeError('"targetEncoding" must be a valid string encoding');
  }
  return Buffer.from(value, sourceEncoding).toString(targetEncoding);
}

Note that the impl does not default to utf-8 and forces an encoding to be specified, this way it would be more clear what the userspace code does.

Sure, that could be done on userside, but this doesn't deserve a separate package, and re-implementing that in all the packages that do conversion also doesn't look very good to me.

Note that if we have had such method earlier, the usage of Buffer without new (and other Buffer-related issues) would have been measurably lower.

If everyone thinks that this is a good idea, I am willing to file a PR for it (impl/docs/tests).

buffer feature request

Source

ChALkeR

Most helpful comment

@jorangreef buffer.transcode() does exactly what you describe.

This issue here seems to be about a string → Buffer → string conversion. As I pointed out above, this is not really useful unless one of the encodings is a binary-to-text encoding, i.e. hex or base64. Given that limited scope and the fact that there has been no activity here, I think closing this is the right call.

addaleax on 21 Jun 2020

👍2

All 21 comments

Or perhaps Buffer.convert(value[, sourceEncoding], targetEncoding) would be better — I am not sure of that yet.

ChALkeR on 25 Nov 2016

I’d be okay with it, although I would probably prefer it as buffer.convert directly on the buffer module, like we have buffer.transcode (to which convert would be the dual).

And you should probably specify both encodings explicitly. For this conversion to make sense, one of these has to be a binary-to-text encoding like base64, and I don’t think there’s any way of knowing whether it’s supposed to be encoding or decoding?

/cc @nodejs/buffer

addaleax on 25 Nov 2016

@addaleax Ah, I entirely missed that one (from #9038).

Then we could use the same header, like buffer.convert(value, fromEnc, toEnc) and perpahs also just buffer.convert(value, toEnc) (defaulting fromEnc to 'utf-8').

ChALkeR on 25 Nov 2016

@ChALkeR I am not sure you can compare the situation wrt default encodings here. For transcode, it makes sense to default to utf-8 for the arguments, because transcode only makes sense to be called for character encodings, not binary-to-text ones.

If you prefer it another way, I guess that’s fine, but I really think explicit encodings would be better here.

addaleax on 25 Nov 2016

Not sure this is a good idea.

buffer.convert(fromStr, target, source) isn't much simpler than Buffer.from(fromStr, source).toString(target).
Order of arguments is confusing (I'd expect source encoding to come before target encoding, so it seems different people have different expectations).
Can be confused with buffer.transcode.
It might not be clear why it's a part of Buffer API even though both the input and the output are strings.

seishun on 28 Nov 2016

@seishun Thanks.

It's a bit shorter, more clear, does the single thing, also it _could_ be optimized later if we decide that it should be.
I have already changed the order in the title to be buffer.convert(value[, fromEnc], toEnc), I did not update the impl, though. Does that look better?
We could limit buffer.convert to strings only and remove Buffer support on input. This way, it couldn't be confused with .transcode, which accepts only Buffer objects.

ChALkeR on 28 Nov 2016

I have already changed the order in the title to be buffer.convert(value[, fromEnc], toEnc), I did not update the impl, though. Does that look better?

I didn't say that the order is better one or the other way. My concern is that people will likely get confused either way.

This way, it couldn't be confused with .transcode, which accepts only Buffer objects.

But they still both live in the buffer module. And it's not obvious from the names which does which.

seishun on 28 Nov 2016

@ChALkeR Leave this open? (I imagine so, but it's been inactive for long enough that I figure it's worth checking.)

Trott on 15 Jul 2017

This is essentially Buffer.transcode()... and can also be accomplished using the Encoding API implementation I've done.

jasnell on 17 Jul 2017

This is essentially Buffer.transcode()

Not quite. Buffer.transcode converts a buffer to a buffer. This converts a string to a string.

seishun on 17 Jul 2017

👍2

so the proposal itself has a prototype code, not sure why it got stalled for 18 months. Calling for contributors!

gireeshpunathil on 20 May 2018

I thought @seishun brought up some valid concerns. Also Buffer.from(fromStr, source).toString(target) is already very convenient. I don't really know how I feel about adding this... It's not like there's even that many people asking for this feature.

apapirovski on 20 May 2018

👍2

@apapirovski - fair point, while I also don't have strong opinion either way (not much demanded from users vs a convenient API abstraction) my motivation was to see how this can be moved towards a conclusion. If we have consensus on no action at this point, we can close this out.

/cc @ChALkeR @addaleax @seishun @jasnell

gireeshpunathil on 20 May 2018

I personally think Buffer.from(fromStr, source).toString(target) is convenient enough.

The only compelling reason I can see is that it _might_ be possible to improve the performance of that operation. But that would be a much more complicated implementation than suggested and it is not clear how significant the difference would actually be.

BridgeAR on 20 May 2018

@BridgeAR I disagree — people do that _a lot_, and some still use new Buffer constructor there.
Simplifying that usecase would be helpful, probably.

ChALkeR on 28 May 2018

@seishun I am also not sure if this should be on the buffer module or somewhere else. Changing the order from to, from to from, to looks fine, I suppose.

ChALkeR on 28 May 2018

@ChALkeR if they did not move away from new Buffer by now I doubt that they would change it to anything else, no matter if it is simpler or not.

BridgeAR on 28 May 2018

👍1

There's been no activity on this in over 2 years and it can be accomplished using existing API. Closing.

jasnell on 19 Jun 2020

I think that Node has no way to convert buffer encodings from one buffer to another without creating an intermediary string and doing associated GC?