Modules: Unify exports implementation across CJS and ESM

Created on 23 Jul 2019  Â·  26Comments  Â·  Source: nodejs/modules

Right now we have two copies of the exports logic. One in CJS and one in ESM. We should try to reuse at least some of the implementation so we can spend less energy trying to sync the two.

pkg-exports

Most helpful comment

And a unified CJS / ESM resolver makes the mud slightly easier to work with.

I am unclear what this means, CJS/ESM currently have differing resolution algorithms and potentially timings (CJS enforces sync/blocking, ESM does not). With those continuing to differ I doubt there is actual utility in trying to share the same resolver functions.

I think a shared way of handling resources and cache-ing results would be enough. I think porting ESM's resolver to JS would help and is more realistic now than when it was originally wrote thanks to primordials. I would still be -1 to intentionally allowing patching of fs to leak into ESM as it does not currently and I don't think that API is the correct way to do things. If we want to hold of on solving the API for fs style resource hooks, we shouldn't be doubling down on hacks using non-robust or guaranteed contracts.

All 26 comments

Step 1: Unify the test suite so that the entire behavior (and the differences) are covered.

@guybedford @bmeck While working on the test PR (https://github.com/nodejs/node/pull/28831) I ran into the ERR_MODULE_NOT_FOUND vs. MODULE_NOT_FOUND inconsistency between ESM and CJS. Was that done on purpose or is that just an artifact of the error macros and could (should?) be fixed?

It was not intentional on my behalf

Created a dedicated issue in case somebody else beats me to it: https://github.com/nodejs/modules/issues/360

I'm starting to lean towards closing this issue as wont-fix. I haven't found a good way to do this and as long as the tests are shared, it feels like we can live with two implementations of exports, just like we live with two implementations of resolution in general.

Wasn’t there an issue for unifying resolution, at least as much as is reasonable? Can unifying exports be part of that?

Maybe..? Don't recall such an issue right now. But I'm not sure if it would realistically happen before unflagging (read: ever?) since there seem to be more urgent gaps that still need closing. I'm not sure if unifying the resolution code would actually get us enough value to be worth it.

One interesting argument for merging is that this question of virtualizing the file system keeps coming up for tools like Yarn.

Currently they can achieve this by hacking fs, but the C++ implementation of modules doesn't permit that.

If we were to move the module resolver back into JS in a shared way, we'd have the ability to easily share both the cache and fs virtualization.

I don't think allowing "hacking the fs" is a good reason to move something. That sounds like opening cans of worms.

The can of worms is not allowing virtualization, because the result would be resource hooks within the C++ resolver and JS codebase.

Resolution is a hot path... the more hooks the slower it gets, the slower Node gets.

That is, unless there's a clever way to virtualize the lib_uv fs implementation itself.

@guybedford fs is tied to OS related things like child processes and shared libraries, that cannot be virtualized. It isn't a simple question of letting people replace fs. You remove the ability to assert things about files if fs is virtualized.

One interesting argument for merging is that this question of virtualizing the file system keeps coming up for tools like Yarn.

Currently they can achieve this by hacking fs, but the C++ implementation of modules doesn't permit that.

If we were to move the module resolver back into JS in a shared way, we'd have the ability to easily share both the cache and fs virtualization.

Fwiw, that’s an issue I’ve had with the modules implementation the entire time it existed – it should not be concerned with file system implementation details at all, that’s just a sign of a design that lacks orthogonality.

This also affects embedders like Electron, who do support loading modules from files that do not exist on-disk, and would affect us if we start supporting e.g. loading code from tarballs.

@addaleax some things like dlopen don't work on virtual files. I don't see how they can be orthogonal.

@addaleax some things like dlopen don't work on virtual files. I don't see how they can be orthogonal.

One way to work around that is to write shared libraries to a temporary path on disk and open that file instead.

@addaleax yes, this was the design I used back in 2014 for Noda but cleanup is unreliable and a lot of extra copies of libraries were written to disk.

It is common for a resolver implementation to be based on a virtualized file interface, which would be easier to do in JS.

The standard model is to expose a readFile, stat and realpath (/ readlink) implementation. Most JS resolver implementations converge on this model, including the ones I've written personally, the ones I've written for Zeit and the TypeScript hooks.

dlopen is not needed within the resolver implementation.

@guybedford dlopen is required during other loading operations and the conversation turned towards allowing alternative fs for loading operations and did not seem limited to resolvers given:

both the cache and fs virtualization.

What cache and general fs virtualization was being talked about?

What cache and general fs virtualization was being talked about?

The cache is that we don't share the package.json cache between CJS and ESM currently to avoid crossing boundaries. The fs virtualization is the standard Yarn pnp / Tink model, which Yarn and others will ask for. And with no ESM solution it's not good for us to stifle innovation.

Loaders cannot solve virtualization because import.meta.url must be expected to work through fs.readFile, just like __filename needs to apply into fs.readFile for CommonJS being the reason for fs-level interception as opposed to resolver-level interception via readdressing.

I don't think we should be working out the perfect hooks model for them here, that is their job. The dlopen exception etc is theirs to sort out.

But I'm just noting that a JS resolver unification would avoid these problems, which should not be so easily dismissed. Perfect is the enemy of good and all.

Loaders cannot solve virtualization because import.meta.url must be expected to work through fs.readFile

I don't think this is true? What requires this? I know on the web import.meta.url != the location of the response and I can imagine things like https/data/etc. URLs not working with fs.

I don't think we should be working out the perfect hooks model for them here, that is their job. The dlopen exception etc is theirs to sort out.

I think thats fine, but I don't think this should be hooking fs I would rather these specific APIs and the expected constraints for them be separate from fs.

I don't think this is true? What requires this? I know on the web import.meta.url != the location of the response and I can imagine things like https/data/etc. URLs not working with fs.

We cannot guarantee that packages on npm will not use fs calls against their own package relatively.

Consider for example a library that loads custom templates from a local folder. It's very common. And users will write code that uses fs, and thus assumes import.meta.url is a relative file:/// scheme.

Tink and Yarn pnp have to be backwards compatible, and if they try to use a different scheme, then fs.readFile(fileURLToPath(import.meta.url)) breaks. They have to avoid these common breaks however they can, using techniques such as hooking fs.

Schemes might work if they support those new schemes in fs itself as well. But ensuring path.resolve etc all play out there is tough. Again these are problems and decisions for these tools to solve, and not ones that I think should dictate our designs.

I am not arguing for anything strong here, just really noting two things:

  1. By not allowing easy fs hooks in the ESM resolver, we are inadvertently restricting their solution space. They have to either implement their own resolver now, or implement their custom scheme for fs calls as well.
  2. A resource hooks model that intercepts fs at a C++ and JS layer before lib_uv is something I am strongly against for performance and complexity reasons.

I hope we can bear the above in mind as these virtualization arguments play out.

Again these are problems and decisions for these tools to solve, and not ones that I think should dictate our designs.

I feel that way about most of the last comment, lots of these problems seem to be scoped to getting the resolver working. If you look from a different angle, things like spawning child processes, dlopen, passing/saving paths between processes, things like using path.resolve, etc. also have problems. Neither side is going to be a universal win. I would prefer we pick a more general side that does allow the possibility of working with things like alternative URL schemes which are talked about with loader hooks and even supported in other runtimes.

By not allowing easy fs hooks in the ESM resolver, we are inadvertently restricting their solution space. They have to either implement their own resolver now, or implement their custom scheme for fs calls as well.

I agree we need to find a solution to hooking CJS/ESM both. I don't think we have to be as restrictive as we are, but using fs seems problematic especially since it has issues with virtualization even if there is some backwards compat with existing hacks. Using existing hacks to model use cases seems good, but using them to enforce API design and limit forwards compatibility seems bad.

A resource hooks model that intercepts fs at a C++ and JS layer before lib_uv is something I am strongly against for performance and complexity reasons.

I don't see how an alternative would be able to avoid the same kind of interception. How would userland solve performance or reduce complexity for this that the runtime cannot? Hooking the entire fs would mean implementing far more than the hooks you describe above.

I agree our role shouldn't be one to solve, and for that reason we should avoid prematurely trying to push APIs and abstractions that solve these problems before we know what the correct solution is.

Rather, we should be aware of our role in the muddy problem space.

And a unified CJS / ESM resolver makes the mud slightly easier to work with.

And a unified CJS / ESM resolver makes the mud slightly easier to work with.

I am unclear what this means, CJS/ESM currently have differing resolution algorithms and potentially timings (CJS enforces sync/blocking, ESM does not). With those continuing to differ I doubt there is actual utility in trying to share the same resolver functions.

I think a shared way of handling resources and cache-ing results would be enough. I think porting ESM's resolver to JS would help and is more realistic now than when it was originally wrote thanks to primordials. I would still be -1 to intentionally allowing patching of fs to leak into ESM as it does not currently and I don't think that API is the correct way to do things. If we want to hold of on solving the API for fs style resource hooks, we shouldn't be doubling down on hacks using non-robust or guaranteed contracts.

I think this is effectively a larger discussion around sharing logic between the two resolvers and goes beyond exports. Going to close this issue since I'm not seeing this happen and I think sharing the test suite gets us most of what we would've wanted from a shared implementation (reassurance that they both do the same thing).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mhdawson picture mhdawson  Â·  4Comments

MylesBorins picture MylesBorins  Â·  5Comments

MylesBorins picture MylesBorins  Â·  4Comments

Jamesernator picture Jamesernator  Â·  4Comments

WebReflection picture WebReflection  Â·  5Comments