Tracking Issue to allow Loaders to create in-memory URLs that can be imported for things like code coverage:
--loader to use MIMEs@bcoe Here ^
For reference: https://w3c.github.io/FileAPI/
@bmeck @TimothyGu I'd be interested in pitching in on this work, along with being one of the early consumers with Istanbul ... designing the Blob and BlobeStore bit sounds interesting. Do you picture we'd be exposing existing structures in V8?
@bcoe great! Unfortunately v8 does not expose Blobs in the File API terms, their blobs in v8.h refer to snapshot blobs which are a very different beast. The File API is quite thorough in what should be done. We should avoid File for now though since I can't think of a clear use case.
The important bit to the BlobStore is that it works across workers. If a worker makes a url using URL.createObjectURL it should be available in all threads.
If you need any help I can assist when I have a bit more free time or if you schedule something in advance I will make time.
For reference - What is BlobStore?
@refack it is the place that url string => Blob mapping is stored by the environment. See spec.
It is used such that it can share URLs across workers so you can do multi-threaded processing: https://jsfiddle.net/ctyvm1tr/1/
@bmeck I intend to make some time this weekend to read through the spec and play with the existing APIs in the browser. Once I know more than _basically nothing_, I would definitely be interested in arranging a quick screen share.
Is there any prior art in the codebase that shares state across workers that we could build on?
@bcoe nothing in this realm that is sane to read that I know of. I know game engines use it, but that isn't helpful since I don't know their internals.
I'm not really certain what the point of this is given we have an existing file system api and various types of buffers. Could this please be elaborated on before implementation? Thanks.
So I've been working on this but I've been behind due to other pressing matters. It's very much something that I would like to see. To be specific: I already have an implementation underway, I just haven't had the time to finish it. My goal is to have an initial implementation by mid to late November.
In terms of the what the implementation would provide:
A node::blob::Blob native class that represents an immutable chunk of data. This could represent a file on disk, it could represent an allocated chunk of memory, etc. There would be a corresponding JS object but the key point of node::blob::Blob is that the data is held at the native layer without ever crossing into JS unless a FileReader is used.
A node::blob::BlobStore native class that is essentially an addressable store for node::blob::Blob objects. This is essentially a relatively straightforward map-like object.
JavaScript level Blob, File and FileReader classes implemented per the spec. These would be backed by the node::blob::Blob.
An implementation of URL.createObjectURL(). There would be both C and JS implementations of this method, allowing a URL to be generated for a node::blob::Blob within a node::blob::BlobStore.
While this all may seem complicated, the interfaces here are rather simple. A File Blob, for instance, is a thin wrapper on top of libuv's existing file system operations for reading a file. This would essentially just end up being a FileReader based alternative to fs.createReadStream(). It's really quite lightweight in the details. The key issue with File, however, is the requirement to support mime types, which we currently do not handle within Core. That will take some thinking to figure out.
For Blob in general, it is really nothing more than a persistent allocated chunk of memory. It would be possible to create a Blob from one or more TypedArray objects. I'm sketching out additional APIs for the http and http2 modules that would allow a response to draw data from a Blob rather than through the Streams API. There is already something analogous in the http2 implementation in the form of the respondWithFile() and respondWithFD() APIs in the http2 side. Basically, the idea would be to prepare chunks of allocated memory at the native layer, with data that never passes into the JS layer (unless absolutely necessary to do so), then use those to source the data for responses. In early benchmarking this yields a massive boost in throughput without the usual backpressure control issues.
There is certainly a cost, and there are aspects of the implementation that are non-trivial, but the benefits are quite real.
FWIW, I'm not entirely sold on the idea of implementing the File and FileReader portions of this model yet, so I haven't worked on those pieces and could easily be talked out of doing so.
@jasnell my personal interest in this API surface is a follow on from:
https://github.com/nodejs/node/pull/15445
The goal being to facilitate test-coverage and other transpilation steps in .mjs files.
I'm picturing that one could instrument code for coverage using pseudo code that looks something like this:
export async function resolve(specifier, parentModuleURL, defaultResolver) {
const resolved = new url.URL(specifier, parentModuleURL)
const ext = path.extname(resolved.pathname)
if (ext === 'mjs') {
const source = fs.readFileSync(resolved.pathname)
const instrumented = istanbul.instrument(source)
const blob = new Blob([instrumentedSource], {type : 'application/mjs'})
return {
url: createObjectURL(blob),
format: 'esm'
}
} else {
return defaultResolver(specifier, parentModuleURL)
}
}
Does it seem like I'm on the same page as to how this API could potentially be used?
_...an aside:_
I keep coming back to the argument that @guybedford's work on #15445 should be exposed through an API hook rather than just a flag. In the world of developer tools, it's often the case that a few transformations need to be performed in sequence, e.g.,
I don't hate the idea of using createObjectURL() to facilitate the transpilation step ... but now that I sit down and hammer out some pseudo code, I'm not immediately seeing how one could compose the multi-step transformations (described above) using the --loader flag.
In the land of require.extensions one is able to create a stack of the prior transformations being applied, and a multistep transpilation can be applied without each actor knowing about the other (this is important, given the fractal nature of developer toolchains).
CC: @demurgos, @iarna
with the new worker api i'd like to get this all working primarily to support new Worker('blob:uuid')
@bmeck that should be enough reasoning to land mimes yea?
This work would be great to see.
@bcoe it's best not to try and see this as the final picture on the matter I think, but rather allow it to inform the discussions. The use case you describe is one very much understood by the modules group, that will be polished in due course.
Would also be interested to hear your thoughts on https://github.com/nodejs/node/pull/18914 as it is a goal of mine to get that going again, just not sure how much to prioritise it right now.
Don't really need the FileReader now when there exist new reading methods on blob's
blob.text() (promise)blob.arrayBuffer() (promise)blob.stream() whatwg readable stream
Most helpful comment
So I've been working on this but I've been behind due to other pressing matters. It's very much something that I would like to see. To be specific: I already have an implementation underway, I just haven't had the time to finish it. My goal is to have an initial implementation by mid to late November.
In terms of the what the implementation would provide:
A
node::blob::Blobnative class that represents an immutable chunk of data. This could represent a file on disk, it could represent an allocated chunk of memory, etc. There would be a corresponding JS object but the key point of node::blob::Blob is that the data is held at the native layer without ever crossing into JS unless aFileReaderis used.A
node::blob::BlobStorenative class that is essentially an addressable store fornode::blob::Blobobjects. This is essentially a relatively straightforward map-like object.JavaScript level
Blob,FileandFileReaderclasses implemented per the spec. These would be backed by thenode::blob::Blob.An implementation of
URL.createObjectURL(). There would be both C and JS implementations of this method, allowing a URL to be generated for anode::blob::Blobwithin anode::blob::BlobStore.While this all may seem complicated, the interfaces here are rather simple. A
FileBlob, for instance, is a thin wrapper on top of libuv's existing file system operations for reading a file. This would essentially just end up being aFileReaderbased alternative tofs.createReadStream(). It's really quite lightweight in the details. The key issue withFile, however, is the requirement to support mime types, which we currently do not handle within Core. That will take some thinking to figure out.For
Blobin general, it is really nothing more than a persistent allocated chunk of memory. It would be possible to create aBlobfrom one or moreTypedArrayobjects. I'm sketching out additional APIs for thehttpandhttp2modules that would allow a response to draw data from aBlobrather than through the Streams API. There is already something analogous in the http2 implementation in the form of therespondWithFile()andrespondWithFD()APIs in the http2 side. Basically, the idea would be to prepare chunks of allocated memory at the native layer, with data that never passes into the JS layer (unless absolutely necessary to do so), then use those to source the data for responses. In early benchmarking this yields a massive boost in throughput without the usual backpressure control issues.There is certainly a cost, and there are aspects of the implementation that are non-trivial, but the benefits are quite real.
FWIW, I'm not entirely sold on the idea of implementing the
FileandFileReaderportions of this model yet, so I haven't worked on those pieces and could easily be talked out of doing so.