Node: Implement createObjectURL/Blob from File API

Created on 12 Oct 2017 · 14Comments · Source: nodejs/node

Tracking Issue to allow Loaders to create in-memory URLs that can be imported for things like code coverage:

[ ] [BlobStore](https://w3c.github.io/FileAPI/#BlobURLStore)
[ ] Blob
[ ] URL.createObjectURL
[ ] URL.revokeObjectURL
[ ] Move --loader to use MIMEs

feature request

Source

bmeck

👍18

Most helpful comment

So I've been working on this but I've been behind due to other pressing matters. It's very much something that I would like to see. To be specific: I already have an implementation underway, I just haven't had the time to finish it. My goal is to have an initial implementation by mid to late November.

In terms of the what the implementation would provide:

A node::blob::Blob native class that represents an immutable chunk of data. This could represent a file on disk, it could represent an allocated chunk of memory, etc. There would be a corresponding JS object but the key point of node::blob::Blob is that the data is held at the native layer without ever crossing into JS unless a FileReader is used.
A node::blob::BlobStore native class that is essentially an addressable store for node::blob::Blob objects. This is essentially a relatively straightforward map-like object.
JavaScript level Blob, File and FileReader classes implemented per the spec. These would be backed by the node::blob::Blob.
An implementation of URL.createObjectURL(). There would be both C and JS implementations of this method, allowing a URL to be generated for a node::blob::Blob within a node::blob::BlobStore.

While this all may seem complicated, the interfaces here are rather simple. A File Blob, for instance, is a thin wrapper on top of libuv's existing file system operations for reading a file. This would essentially just end up being a FileReader based alternative to fs.createReadStream(). It's really quite lightweight in the details. The key issue with File, however, is the requirement to support mime types, which we currently do not handle within Core. That will take some thinking to figure out.

For Blob in general, it is really nothing more than a persistent allocated chunk of memory. It would be possible to create a Blob from one or more TypedArray objects. I'm sketching out additional APIs for the http and http2 modules that would allow a response to draw data from a Blob rather than through the Streams API. There is already something analogous in the http2 implementation in the form of the respondWithFile() and respondWithFD() APIs in the http2 side. Basically, the idea would be to prepare chunks of allocated memory at the native layer, with data that never passes into the JS layer (unless absolutely necessary to do so), then use those to source the data for responses. In early benchmarking this yields a massive boost in throughput without the usual backpressure control issues.

There is certainly a cost, and there are aspects of the implementation that are non-trivial, but the benefits are quite real.

FWIW, I'm not entirely sold on the idea of implementing the File and FileReader portions of this model yet, so I haven't worked on those pieces and could easily be talked out of doing so.

jasnell on 13 Oct 2017

👍2

All 14 comments

@bcoe Here ^

bmeck on 12 Oct 2017

For reference: https://w3c.github.io/FileAPI/

TimothyGu on 12 Oct 2017

@bmeck @TimothyGu I'd be interested in pitching in on this work, along with being one of the early consumers with Istanbul ... designing the Blob and BlobeStore bit sounds interesting. Do you picture we'd be exposing existing structures in V8?

bcoe on 12 Oct 2017

@bcoe great! Unfortunately v8 does not expose Blobs in the File API terms, their blobs in v8.h refer to snapshot blobs which are a very different beast. The File API is quite thorough in what should be done. We should avoid File for now though since I can't think of a clear use case.

The important bit to the BlobStore is that it works across workers. If a worker makes a url using URL.createObjectURL it should be available in all threads.

If you need any help I can assist when I have a bit more free time or if you schedule something in advance I will make time.

bmeck on 12 Oct 2017

For reference - What is BlobStore?

refack on 12 Oct 2017

@refack it is the place that url string => Blob mapping is stored by the environment. See spec.

It is used such that it can share URLs across workers so you can do multi-threaded processing: https://jsfiddle.net/ctyvm1tr/1/

bmeck on 12 Oct 2017

👍1

@bmeck I intend to make some time this weekend to read through the spec and play with the existing APIs in the browser. Once I know more than _basically nothing_, I would definitely be interested in arranging a quick screen share.

Is there any prior art in the codebase that shares state across workers that we could build on?

bcoe on 13 Oct 2017

@bcoe nothing in this realm that is sane to read that I know of. I know game engines use it, but that isn't helpful since I don't know their internals.

bmeck on 13 Oct 2017

I'm not really certain what the point of this is given we have an existing file system api and various types of buffers. Could this please be elaborated on before implementation? Thanks.

Fishrock123 on 13 Oct 2017

In terms of the what the implementation would provide:

A node::blob::Blob native class that represents an immutable chunk of data. This could represent a file on disk, it could represent an allocated chunk of memory, etc. There would be a corresponding JS object but the key point of node::blob::Blob is that the data is held at the native layer without ever crossing into JS unless a FileReader is used.
A node::blob::BlobStore native class that is essentially an addressable store for node::blob::Blob objects. This is essentially a relatively straightforward map-like object.
JavaScript level Blob, File and FileReader classes implemented per the spec. These would be backed by the node::blob::Blob.
An implementation of URL.createObjectURL(). There would be both C and JS implementations of this method, allowing a URL to be generated for a node::blob::Blob within a node::blob::BlobStore.

There is certainly a cost, and there are aspects of the implementation that are non-trivial, but the benefits are quite real.

FWIW, I'm not entirely sold on the idea of implementing the File and FileReader portions of this model yet, so I haven't worked on those pieces and could easily be talked out of doing so.

jasnell on 13 Oct 2017

👍2

@jasnell my personal interest in this API surface is a follow on from:

https://github.com/nodejs/node/pull/15445

The goal being to facilitate test-coverage and other transpilation steps in .mjs files.

I'm picturing that one could instrument code for coverage using pseudo code that looks something like this:

export async function resolve(specifier, parentModuleURL, defaultResolver) {
  const resolved = new url.URL(specifier, parentModuleURL)
  const ext = path.extname(resolved.pathname)
  if (ext === 'mjs') {
    const source = fs.readFileSync(resolved.pathname)
    const instrumented = istanbul.instrument(source)
    const blob = new Blob([instrumentedSource], {type : 'application/mjs'})
    return {
      url: createObjectURL(blob),
      format: 'esm'
    }
  } else {
    return defaultResolver(specifier, parentModuleURL)
  }
}

Does it seem like I'm on the same page as to how this API could potentially be used?

_...an aside:_

I keep coming back to the argument that @guybedford's work on #15445 should be exposed through an API hook rather than just a flag. In the world of developer tools, it's often the case that a few transformations need to be performed in sequence, e.g.,

a TypeScript transpilation step takes place to translate TypeScript typing into valid ES2015
a Babel transpilation parsing bleeding edge features, e.g., class decorators.
Istanbul runs, adding line counters to each line of (now ES2015) code.

I don't hate the idea of using createObjectURL() to facilitate the transpilation step ... but now that I sit down and hammer out some pseudo code, I'm not immediately seeing how one could compose the multi-step transformations (described above) using the --loader flag.

In the land of require.extensions one is able to create a stack of the prior transformations being applied, and a multistep transpilation can be applied without each actor knowing about the other (this is important, given the fractal nature of developer toolchains).

CC: @demurgos, @iarna

bcoe on 13 Oct 2017

👍1

with the new worker api i'd like to get this all working primarily to support new Worker('blob:uuid')

@bmeck that should be enough reasoning to land mimes yea?

devsnek on 20 Jun 2018

👍1

This work would be great to see.

@bcoe it's best not to try and see this as the final picture on the matter I think, but rather allow it to inform the discussions. The use case you describe is one very much understood by the modules group, that will be polished in due course.

Would also be interested to hear your thoughts on https://github.com/nodejs/node/pull/18914 as it is a goal of mine to get that going again, just not sure how much to prioritise it right now.