That is, update fs, http, streams, etc.
This is a proposal, but I would think with @trevnorris's work on making Buffer a subclass of Uint8Array, it seems likely to be not too hard...
On the web the best practice has emerged that when accepting arguments, allow any of ArrayBuffer or typed array or DataView, which is why I think more than just Buffer | ArrayBuffer would make sense.
What do people think? @trevnorris, am I understanding correctly that after your work lands this would not be very hard?
I'm guessing that this would be within the confines of core and not 3rd party addons (at least such addons would need to explicitly support ArrayBuffers also)?
@mscdex good question. I am not sure how third-party addons get access to the underlying void*. If they all go through a function, we should be able to modify that function to accept any of these...
@domenic if you pass new Buffer(new ArrayBuffer(5)) would you expect the data to be copied or pointed to by the new buffer instance?
Either way it can be done. This was actually on my list of things to implement, but I wanted to keep the initial PR small.
TBH it wouldn't be hard to support ArrayBuffer in most of the API. e.g.
var buf = new Buffer(16);
var ab = new ArrayBuffer(16);
buf.copy(ab);
Third party support shouldn't be a problem. The native Buffer API still hands back the void* that the ArrayBuffer points to. So anyone can operate on it as they usually have.
@trevnorris to clarify I wasn't just talking about the buffer API; I was talking about the whole io.js API surface area. FS, streams, etc. Not sure if that came across in the OP... editing now.
if you pass new Buffer(new ArrayBuffer(5)) would you expect the data to be copied or pointed to by the new buffer instance?
Given how new Uint8Array(new ArrayBuffer(5)) behaves I'd expected pointed to, but I am more used to typed arrays than I am to buffers to be honest...
This was actually on my list of things to implement, but I wanted to keep the initial PR small.
Yeah definitely, can leave this for a minor version probably.
A pointer is fine with me.
If we say that an ArrayBuffer is treated as binary data then this is feasible. Typed Arrays can be a little tricky because of user's expectations (e.g. is the "length" to be written the byte length or the index length). But it can definitely be investigated.
+1 for even talking ArrayBuffer / node.JS Buffer..
The native Buffer API still hands back the void* that the ArrayBuffer points to.
A pointer is fine with me.
Let's say variable X = ArrayBuffer while the Y = Buffer (points to X's memory). As long as Y is alive, there is a need to keep X from GC.
A naive hope / question / discussion, (although it's a clear break in the API) why not just use ArrayBuffer and node.JS buffer as it's view (optional) ?
if you pass new Buffer(new ArrayBuffer(5)) would you expect the data to be copied or pointed to by the new buffer instance?
I would expect pointed to. All other argument types to the Buffer constructor get copied, but they're all array-like (have length property and are indexable). ArrayBuffer is not array-like, so it's not a huge deal if we add a special case here. It's more useful to have it share memory, I think.
This has been inactive for a while. I'm going to throw a help wanted label on it, but feel free to remove it if, you know, help isn't actually wanted. (Why do I type these things?)
This would be more easily done now that Buffer is a Uint8Array. In fact the native Buffer::HasInstance() only checks if the object is a Uint8Array instance. Change would be tedious and require a lot of documentation updates. Especially since it may be confusing out of context reading an API takes a Uint8Array, which Buffer is, but not know that you can actually pass a Buffer.
ArrayBuffer would require a bit more work, but also doable now.
It seems to me that the "right" way to fix this is to have everything internally operate on ArrayBuffers, then you have the equivalent of the following in all public binary-data accepting APIs:
function f(..., buffer) {
if (ArrayBuffer.isView(buffer)) {
buffer = buffer.buffer;
// this will detect Buffers, Uint8Arrays, DataViews, etc.
}
// now you can process the ArrayBuffer buffer, maybe
// after validating that it's a true ArrayBuffer.
}
This was actually on my list of things to implement,
@trevnorris Is it still on your list? If not, would you be willing to mentor someone else who wanted to do it? If so, we can add a mentor available label, I suppose.
A small update after #12223:
ArrayBuffer (all TypedArray types and DataView) has been set up.ArrayBuffer yet.ArrayBuffer in addition to Buffer.Uint8Array in addition to Buffer, but not Uint16Array, DataView, etc., thanks to work done by @addaleax.Example of converting a Buffer-only function to one that also supports Uint8Array: https://github.com/nodejs/node/commit/627ecee9edd9df247e7541ee8084d925f85ede7f
Example of converting a function that supports Uint8Array to one that supports all other ArrayBufferViews: https://github.com/nodejs/node/commit/2ced07ccaf7682b9ec8fb3bcc3dc8d2bb2798c61
@Trott Regardless of @trevnorris's status, I would be available for mentoring this, both now and during the upcoming Code & Learn. If someone drafts a list of modules to convert, as far as I'm concerned even good-first-contribution can be applied to this issue.
@zahidul-islam might be interested in taking this on if @TimothyGu is willing to mentor. Maybe he can start with a single module just to get a feel for it?
@Trott Been wrapped up w/ work and async hook. If @TimothyGu is available then he should probably mentor.
I'd like to reiterate that I'm available for mentoring. @Zahidul-Islam We can get started on any module that doesn't yet support any other TypedArrays except Uint8Array. What time will you be available for talk on IRC? I'm generally available 12am – 9am UTC due to the timezone I'm in.
Hey @TimothyGu I've been peering at this item and wouldn't mind helping out (but could maybe make use of some mentoring). Your comment makes sense so I can probably get started, but still wouldn't mind some help. I'm at the Code&Learn today, but can also follow up later.
Hi @TimothyGu , can we have a checklist of modules where this transition needs to be done?
Also, is converting modules to use UInt8Arrays instead of Buffers a precondition to converting them to use TypedArrays or DataViews?
I'd like to help with any one module, if this is still pending.
can we have a checklist of modules where this transition needs to be done?
I think most of the modules still don't have transition done, but here's one:
This list is for TypedArray/DataView support only. ArrayBuffer support could be tracked separately.
Also, is converting modules to use
UInt8Arrays instead ofBuffers a precondition to converting them to useTypedArrays orDataViews?
Mostly yes, but it could be done in a single commit, though I think most modules have been converted to support Uint8Array already.
cool, thanks! I'd like to start on the fs module, if that's fine.
@TimothyGu : maybe I'm missing something, but I couldn't find any API surface to convert to use TypedArrays or DataViews in the following modules:
vmv8Also, I'm not sure how a string_decoder should work with TypedArrays, as they do not support a .toString() method (unlike a Buffer, and so a Uint8Array). Any ideas?
Friendly reminder #22562 PR tries out the string_decoder item from the checklist. Looking forward to your review.
Friendly reminder: #22921 tries out the vm item from the checklist of earlier comment.
Friendly reminder: #23210 => tls item from the checklist of earlier comment.
Simply grabbed the checklist from the comment, & appended the PR #'s to each of the modules for status tracking:
Related change for v8 module is introduced in #23953.
I'm eager to help @TimothyGu. If possible, I would be incredibly thankful for any guidance/mentoring. I looked at the examples you provided and think I understand what needs to be done (maybe some initial guidance?). I apologize if I come off as a complete "noob", but I really enjoy Node, the community seems great, and I want to actually CONTRIBUTE!
Not sure if it's exactly the same problem but in my case what I want is something that is exactly the same as Buffer/Uint8Array in terms of API (either would work for me) but just allows an arbitrarily large size like ArrayBuffer instead of being capped like Buffer and Uint8Array. Not sure if there is any plan to remove the Uint8Array cap but otherwise a good solution for a wrapper around ArrayBuffer would be to allocate a series of internal Uint8Array views on construction and use those to set and grab values.
E.g.
const UINT8_ARRAY_VIEW_SIZE = 1000 * 1000 * 500 // 500MB ?
class UncappedUint8Array extends Uint8Array {
static from(arr) {
return arr.slice();
}
constructor(arrayBuffer) {
this._arrayBuffer = arrayBuffer;
const { length } = this;
this._views = Array(Math.ceil(length / UINT8_ARRAY_VIEW_SIZE));
let offset = 0;
while (offset < length) {
const size = Math.min(length - offset, UINT8_ARRAY_VIEW_SIZE);
this._views.push(new Uint8Array(arrayBuffer, offset, size));
offset += size;
}
}
get length() {
return this._arrayBuffer.byteLength;
}
// could be used for index get
_byteAt(index) {
const viewSize = UINT8_ARRAY_VIEW_SIZE;
return this._views[Math.floor(length / viewSize)][length % viewSize];
}
// could be used for index set
_setByteAt(index, value) {
const viewSize = UINT8_ARRAY_VIEW_SIZE;
this._views[Math.floor(length / viewSize)][length % viewSize] = value;
}
indexOf(value) {
const { length } = this;
for (let i = 0; i < length; i++) {
if (this._byteAt(i) === value) {
return i;
}
}
return -1;
}
slice(startIndex, endIndex) {
return new UncappedUint8Array(this._arrayBuffer.slice(startIndex, endIndex));
}
// ... etc
}
Not sure what amount of work would be required for me to implement the whole Uint8Array or Buffer interfaces this way, if too complicated could be something to consider for core? I guess I would need a Proxy to handle indexes? But not sure if that would be sufficient for all the other methods.. would they defer to the index Proxy?
I’ll close this issue because I think it’s (mostly?) done.
@benwiley4000 The fact that ArrayBuffers are fixed-size is prescribed by the JS spec, it’s outside of Node.js’s control. This is something that probably would have to be taken up with TC39 first, if you want to see changes in the language or in Node.js.
Most helpful comment
Friendly reminder: #23210 =>
tlsitem from the checklist of earlier comment.Simply grabbed the checklist from the comment, & appended the PR #'s to each of the modules for status tracking: