Modules: Rediscussion for out-of-order CJS exec?

Created on 4 May 2020  ·  62Comments  ·  Source: nodejs/modules

I'd like to reopen the discussion of out-of-order CJS exec one last time, given how much criticism there has been over the lack of named exports for CommonJS in ESM.

I can revive a former PR for this which is still sitting around somewhere no problem.

The two "problematic semantics" as far as this group is concerned are:

  1. CJS executing during the instantiation phase instead of during the execute phase.
  2. Because of the getter triggers I would also prefer to move the "wrapper creation" to be lazily done on the first import, instead of for every CJS module. That is, doing the exports snapshot at import time, even if it was required earlier. The reason being that otherwise an app that loads 100s of CommonJS modules snapshotting all of those on startup will both be slower and also may trigger lots of unnecessary warning and error side effects.

I have no problem with either of these semantics and never have.

Will other members of this group finally reconsider their positions in the name of practicality for users?

This scenario is definitely one of the biggest catches for users that we can still change. It could be worth re-hashing this old number even if that's a small possibility.

Most helpful comment

We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.

Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree.

I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:

  • transitive dependencies of CJS also get hoisted. it is hard to understand how things would be re-arranged by both humans and tools. I do not see this as making the user experience better.
  • workflows like console based debugging will start to get very confusing, I do not see this as making the user experience better.
  • it doesn't match how faux modules (transpiled CJS trying to emulate parts of ESM, not real ESM) work. If the goal is to achieve the same thing as faux modules, this does route not seem to be well understood or having history being tested in the wild. If tooling could provide such a historical benchmark to see breakage and/or UX it would be helpful.
  • it doesn't match what is now shipping in Node unflagged. Re-ordering things that people are depending in a stable version on is not something I feel comfortable with.
  • it cements that APMs/Polyfills/Security hardening/global error handlers/etc. need to stay CJS forever (and want import condition to load CJS in particular). Side effects in modules are a natural part of the runtime having so many features being 1st class dynamic JS APIs. If the goal is to make CJS a legacy feature, this instead would permanently force those workflows to be CJS in order to safeguard the timing of their evaluation. This would be against one of the goals of ESM allowing a path towards codebases that are purely ESM and/or work on environments like the web. Breaking such a goal seems to need to be weighed heavily.

I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact.

It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful.

Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well.

This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less.

Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM.

I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions.

All 62 comments

I seem to remember a PR where you implemented a "golden middle": during the binding phase, Node assumes that the exports are as the imports defined them, leaves the execution to the execution phase, and if during the execution it finds that the imports were _not_ the same as the exports, throws an exception.

That would sound to me like an (almost) perfect solution.

@giltayar yes this was the dynamic modules proposal that we put so much time into at TC39 (https://github.com/nodejs/dynamic-modules). The solution, implemented pretty much exactly as you describe, hits on some quite deep complexities:

  1. export * from 'a'; export * from 'b' is supposed to throw when a and b both export the same named export. Allowing the names to be user/consumer-defined means this validation phase needs to be post-handled / ignored somehow.
  2. import * as ns in a cycle can expose a module namespace object for a CJS module with the exports being displayed before the execution of that CJS module has completed. Since the exposed exports are then the user definitions this exposes non-determinism despite any later execution error.

The dynamic modules solution was to ban export star of commonjs modules for (1) and to patch the namespace for (2), but this approach was shot down at TC39 after multiple attempts at consensus.

The simple solution being discussed here can be implemented in a PR tomorrow, and doesn't come with anything close to these complexities.

yes this was the dynamic modules proposal

So this proposal determined the names of the CommonJS exports by looking at what was _imported,_ e.g. in import statements; did we ever consider determining the names of the CommonJS exports by _parsing_ the CommonJS?

I remember there was opposition to evaluating the CommonJS in the instantiation phase, and strong opposition to evaluating CommonJS code twice; but what about just _parsing_ it in one of these pre-evaluation phases? We would miss any CommonJS exports that are dynamically named, but perhaps that's okay? That could be a caveat we explain in the docs as a limitation of importing CommonJS into ESM; hopefully that edge case is rare enough that it shouldn't come up much.

Is there a way to reuse the results of the parse to minimize the perf hit of that?

I'm not a fan of parsing CJS modules, because it's heuristic, and there would be no rule we could write in the documentation that says when it will work and when not. Worse: due to a change in the parsing code (which _will_ happen), a module that worked in one version of Node, will stop working in another.

Too non-deterministic for my taste.

but this approach was shot down at TC39 after multiple attempts at consensus

True, and from TC39's perspective, they may be right. But if you're anyway deeming CJS to be "out of TC39 scope" (which I'm OK with!) then you can definitely go the "dynamic modules" route, which has the added benefit of making the execution order intuitive to the developer: I believe execution order should be "serial" and not an interleaving (which it will be if we execute CJS in the binding phase), whereby first the CJS in the tree will be executed and then the ESM.

then you can definitely go the "dynamic modules" route

The problem is that dynamic module is the route that does need TC39 support because it requires actual support in the VM (correct me if I'm wrong, Guy). Early execution or something like a top-level-parse would both be possible if we say "CJS exists outside of the standardized module system and its execution isn't considered module execution".

Afaik the current state is:

  • Early Execution: Technical side works fine, allows full named exports as long as they exist after initial execution. Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree. Some race conditions (potentially, likely rare) if we don't snapshot the exports of every CJS after first execution. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
  • (Partial) Parse: Technical side works fine, allows named exports but only heuristically if they can be determined from a parse (top level parse?). Every imported CJS file has to be parsed twice (once by module loader, once by V8). Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
  • Dynamic Module: Technical side requires buy-in from V8 which requires buy-in from TC39. The weirdness around * exports and namespace objects which may prevent it from ever finding acceptance (see Guy's comment above). Outside of *, "perfect" bindings. CJS executes when the respective ESM node should execute. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
  • Default Only: Technical side works fine but only allows module.exports as default. No collision between module.exports.default and module.exports. CJS executes when the respective ESM node should execute. Relatively easy to simulate/recreate/modify in a userland loader.

My personal take: I think partial parse would be a reasonable compromise and JDD's esm seemed to show that the performance impact was _okay_, especially since it would only affect the boundaries between ESM and CJS so the majority of modules wouldn't be double-parsed.

The bad news is that interop-require conflated module.exports.default and module.exports in some places. We would have to pick module.exports for backwards compat afaict and that would create a potentially even uncannier valley than what we have today. If we're okay with a breaking change there, the parser could look out for that magic es module property and switch to module.exports.default based on it.

Relatively easy to simulate/recreate/modify in a userland loader.

To elaborate on this point: All the more advanced solutions have issues once somebody wants to write a userland loader. If I want to serve CJS from HTTP/archive/compiled-on-the-fly, I may have to reproduce the core behavior. Assuming the core behavior is "parse the source and then apply a specific heuristic to determine likely named exports", that quickly becomes a challenge. The same is true for any kind of static analysis tool (compiler, linter, bundler, ...). It's not a blocker but I wanted to call it out.

We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.

Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree.

I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:

  • transitive dependencies of CJS also get hoisted. it is hard to understand how things would be re-arranged by both humans and tools. I do not see this as making the user experience better.
  • workflows like console based debugging will start to get very confusing, I do not see this as making the user experience better.
  • it doesn't match how faux modules (transpiled CJS trying to emulate parts of ESM, not real ESM) work. If the goal is to achieve the same thing as faux modules, this does route not seem to be well understood or having history being tested in the wild. If tooling could provide such a historical benchmark to see breakage and/or UX it would be helpful.
  • it doesn't match what is now shipping in Node unflagged. Re-ordering things that people are depending in a stable version on is not something I feel comfortable with.
  • it cements that APMs/Polyfills/Security hardening/global error handlers/etc. need to stay CJS forever (and want import condition to load CJS in particular). Side effects in modules are a natural part of the runtime having so many features being 1st class dynamic JS APIs. If the goal is to make CJS a legacy feature, this instead would permanently force those workflows to be CJS in order to safeguard the timing of their evaluation. This would be against one of the goals of ESM allowing a path towards codebases that are purely ESM and/or work on environments like the web. Breaking such a goal seems to need to be weighed heavily.

I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact.

It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful.

Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well.

This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less.

Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM.

I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions.

I forgot that this also may violate things like tick ordering of queued tasks, but I assume that is understood by the nature of this hoisting.

@bmeck Does your take above apply to any of the options from @jkrems’ comment, or the overall idea of executing CommonJS out of order? I'm not clear on which proposal you're pointing out problems with.

Looking at Jan's options, it sounds like the last one (Default Only) is what we have now; Early Execution is probably disqualified for all the reasons you listed in your comment about it cementing CommonJS primacy; Dynamic Modules is a no-go without TC39 buy-in; and that leaves (Partial) Parse. I don't _think_ the parse option has all the timing and other issues you're discussing, does it? Aside from the parse possibly missing some names, and the issue with default, are there any other problems with that option?

@GeoffreyBooth parsing does not have the issues of timing, but does have heuristic concerns. See the output of transpilers like: this with export * which do require runtime info. I also had a stripped down pragma PR "use exports ...list..."; a long while back but people didn't like the perf problems it had. The complexity of a full JS parse is non-trivial and we would need to document our exact heuristic such that transpilers could target it and it could continued to be supported.

So others who have thought about this more can propose better solutions, but what I had in mind for a parsing algorithm would be to try to keep it simple enough that it's straightforward to understand. Like:

  1. Find all references to a non-declared exports object (i.e. an exports variable that has no var/const/let/function parameter declaration) or a likewise non-declared module object with an exports property.
  2. For each such reference, parse the references of statically-defined properties of that object (such as exports.foo or exports['foo']) or module.exports.foo and module.exports['foo'] etc.).
  3. Also parse assignment of an object to that reference (exports = { foo: 1 }, etc.).

And . . . that's it? If there are any other really common ways of declaring exports, we could extend our logic to include them, but the idea would be that the docs describe this algorithm in plain English and say that CommonJS exports defined such that they're discoverable via this algorithm will be importable into ESM; and others will not be.

So yes, the transpiler output of export * from would fall into the unsupported category; but the transpiler output of export const foo = would be parseable. Ditto export function foo() and export class foo() and export { foo }. The main exceptions that I can find in some cursory tests are export * from and export default would be unsupported, and we would call those out in the docs (as well as a traditional CommonJS export named default).

Prior art: https://github.com/rollup/plugins/blob/master/packages/commonjs/src/transform.js
That module also has to handle imports, but I think it's an interesting case because users of rollup already rely on it to support named imports from CommonJS dependencies.

Regarding perf hit of a double parse.

It is my understanding that the fetch / parse / link phase can be done asynchronously. As CJS modules will all be terminal nodes in the graph. While I don't think we are doing this yet we could move the parsing / validation off the main thread. I also think, but could be wrong, they there might be wiggle room in the spec to begin the execution phase before the validation is finished here... This might be a bit more controversial though. I'm also curious if we could potentially code cache the result of the parsing to reuse later... if V8 does not yet have an API for something like that it might be something we could ask about.

So, with that said, I think there is likely room to explore the double parse option and ways to architect it so that the performance hit is not significant. Is there a reason it is a non-starter from a spec perspective?

I do not have any significant objections to parsing but would like to see more numbers around it. To my knowledge parsing to form the export list does not violate any spec expectations (indeed this is what ESM does and across modules no less!), but doing evaluation would be a change. If parsing is done up front, I am unsure if we would need the execution to actually be hoisted. For a variety of runtime dependent stuff like the babel output I linked, if it is setup to be cross module you might be able to fully form the heuristic graph even for export * from

I have previously always been against parsing for the extra overhead but can possibly get behind it if we do it lazily only on the ESM/CJS boundary.

I think it would be important that the parsing is done lazily because if we apply it to all CJS modules loaded that means even Node.js users who don't use ES modules will pay this cost - that is we should remove the injection into the ESM registry to do lazy snapshotting.

@bmeck I know you have had a problem with the idea of lazy snapshotting before, would you be able to compromise on that?

If evaluation order is preserved I can live with it. Might still be
surprising but our current CJS cache in the translator also has a similar
gotcha if you try hard enough.

On Mon, May 4, 2020, 3:15 PM Guy Bedford notifications@github.com wrote:

I have previously always been against parsing for the extra overhead but
can possibly get behind it if we do it lazily only on the ESM/CJS boundary.

I think it would be important that the parsing is done lazily because if
we apply it to all CJS modules loaded that means even Node.js users who
don't use ES modules will pay this cost - that is we should remove the
injection into the ESM registry to do lazy snapshotting.

@bmeck https://github.com/bmeck I know you have had a problem with the
idea of lazy snapshotting before, would you be able to compromise on that?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/modules/issues/509#issuecomment-623681844, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AABZJI44BT22F65PGFPU4CTRP4O57ANCNFSM4MYNSHUA
.

Great, that sounds like it could well be a possibility then to me, with analysis as @GeoffreyBooth describes in https://github.com/nodejs/modules/issues/509#issuecomment-623626012.

I know we use Acorn for parsing here and there in our internals; is it possible to use V8 itself to parse JavaScript source code and return an abstract syntax tree back to Node?

On top of that, could we then submit that AST back into V8 when it’s time to evaluate that code? Rather than asking V8 to parse the source again.

If both of these are possible, we could avoid a double parse. Even if only the first is possible, it would be preferable to using Acorn both for speed and to ensure that the parsed AST we work with matches what V8 evaluates.

V8 does not expose any AST like structure after it parses. The AST is internal to V8 and not a public API.

/cc @nodejs/v8 who might have opinions

I'm against the cjs parsing idea. I know of a lot of cjs code that dynamically builds exports which this would fail on. Even if we do go forward with it, we will need to make sure we parse using the exact behaviour of the parser in the engine we use, which acorn doesn't fulfill. Since people also embed node within other engines (for example graaljs), I don't think we can ever reach that.

Could this be a "perfect is the enemy of good" situation?

If we can statically analyze some exports with minimal performance overhead
is there a reason to not do it?

On Mon, May 4, 2020 at 6:23 PM Gus Caplan notifications@github.com wrote:

I'm against the cjs parsing idea. I know of a lot of cjs code that
dynamically builds exports which this would fail on. Even if we do go
forward with it, we will need to make sure we parse using the exact
behaviour of the parser in the engine we use, which acorn doesn't fulfill.
Since people also embed node within other engines (for example graaljs), I
don't think we can ever reach that.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/modules/issues/509#issuecomment-623739611, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AADZYV6AS4AJ3DACFI7V7NDRP4565ANCNFSM4MYNSHUA
.

@MylesBorins since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good"

I wouldn’t call it “breaking” when those values are still reachable from the default export. And I would assume that the entire module.exports object would be on the default export in those cases.

since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good"

Is it worth mentioning that JS autocomplete/autoimport in vscode, VS, webstorm, and a bunch of other editors, is essentially predicated on the idea that "most JS cjs module exports are statically analyzable"? And that all of those go out the window when you use more dynamic export patterns, too? Don't get me wrong, I'd still rather see the dynamic modules solution happen via some means, but I wouldn't say cjs static analysis is a _wholly_ untested concept.

@jkrems if you restructure your cjs module to use dynamically created properties and someone using esm depends on named exports it is by definition a breaking change. the graph will not validate. the application will exit with an error.

Like, there's nobody _advocating_ for people to use more dynamic cjs export names, I hope, since they cause a ton tooling headaches elsewhere in the ecosystem, like in tree shaking or editor intellisense... Like, you _can_ dynamically build your exports, but it's really edge case packages that actually _do_, in practice, because of all the other ecosystem bits it already impedes.

@weswigham

  • works most of the time

I don't think node core should do things that only work most of the time. That's somewhat of a philosophical argument about correctness though, so I'm not sure if that's the thing we should discuss.

  • advocating for migrating to dynamic exports

Regardless of what is being advocated, people do all sorts of odd things with cjs. If we enable this behaviour, we create a refactoring hazard around otherwise invisible changes within a module, which I would classify as node being broken.

I don't think node core should do things that only work most of the time.

This is not one of those cases. It will work 100% of the time when the author follows the rules we specify, e.g. that the author writes CommonJS that is easily statically analyzable.

@GeoffreyBooth "it will work 100% of the time ignoring the times when it doesn't work" i can't tell if this is satire or not.

aside from that, if you're updating your package you can just add esm wrappers. The entire point of worrying about cjs is that there are oodles of packages that won't be updated to support esm.

My concern with parsing is not that it won't 100% work, but rather that any change in the algorithm (and there are always changes/bugs in heuristic algorithms) will break previously running code because a named export that was there will suddenly _not_ be. That sounds to me dangerous and will lead to fear of changing the heuristics, even for a bug.

Yes, VSCode and others do that, but I've seen time and time again when working with them, that modules I have that weren't analyzed correctly, suddenly are, and (a bit less I have to admit) vice-versa. But what is OK in an IDE, I believe is not OK in a runtime.

Dynamic Modules is a no-go without TC39 buy-in

Are we sure about that? Have we asked the v8 team?

But what is OK in an IDE, I believe is not OK in a runtime.

I would second that.

If there's a perfectly valid, syntactically correct way of writing a CJS module, I don't see why someone somewhere won't be using it, and adding behavior that only works in some cases but not the others sounds like a dangerous way to make things confusing.

@giltayar

Have we asked the v8 team?

It would require a willful violation of the spec which is something we try to avoid.

It would require a willful violation of the spec which is something we try to avoid.

Too bad. I hope the end result won't be that Node.js ends up executing CJS out of order, which is even worse. But I understand the hesitation.

"it will work 100% of the time ignoring the times when it doesn't work" i can't tell if this is satire or not.

I'm serious. If you set the bar so high that it's impossible to meet, by definition that's making the perfect the enemy of the good. If your bar is 'named exports from CommonJS, no matter what, even if they're dynamically set' and you refuse to accept anything less, then you'll get nothing.

Whereas 'named exports from CommonJS when they're defined in a way that's easily statically analyzable' _is_ a standard that we can meet, without a too-complicated heuristic. We don't need code as complex as Rollup's, since we're not trying to be as comprehensive.

We can also test our algorithm. We can run our 'get statically-defined CommonJS named exports' function against a whole bunch of npm packages and save the names returned; and then actually require those packages and inspect the exports and compare them against our earlier results. Yes, I would expect that some of them will have exports named default or dynamically-named exports or exports that appear only because they've overridden require or some other ridiculous thing; but we can in our test determine the real-world percentage of such things for the top 100 or 1000 or however many npm packages. If that percentage of correctly parsed named exports is very high, say 99%, then we have to weigh the value of denying named exports under any circumstances against the value of providing them under defined circumstances. But first someone needs to write a PR that makes the attempt.

My concern with parsing is not that it won't 100% work, but rather that any change in the algorithm (and there are always changes/bugs in heuristic algorithms) will break previously running code because a named export that was there will suddenly _not_ be.

To properly evaluate this concern we need to actually write the code. If we can write a function that's simple and looks like it won't change once it's unflagged (and I agree, that's probably going to be the case) and we can document its algorithm along the lines of https://nodejs.org/api/esm.html#esm_resolution_algorithm, then I think it's shippable.

@GeoffreyBooth as I've said, that is indeed my bar, and I would rather not ship something that uses guesswork in node core.

and repeating myself again, if we have to document how to write a cjs package we've probably failed, because people writing packages can/should include esm wrappers.

@devsnek it is not guesswork, but it is not 100% coverage. Assigning to the a property of a newly allocated exports object is a fairly clear sign that you want to add something to it. I'm not sure what guesswork is being done in the algorithm mentioned above, it lists a heuristic that does not seek to change over time nor is affected by things like locality or order of operations. It might be helpful to figure out what is guesswork so that we could remove any guessing. I think there is a firm grasp that such a heuristic might not cover all cases in which code may be written. There are similar issues with things like the name inference of anonymous functions within the specification itself not adopting names in all source positions such as when provided as an argument to a function call. Can you explain what the necessary qualification of things to be considered not to be guesswork?

Perhaps it would also be helpful to separate the ability to cover 100% of CJS runtime effects vs static analysis that must be done ahead of time since the proposed algorithm is only placed in the static analysis timeline. If the stated constraint is we must not use static analysis then that would be helpful to state directly. However, given that static analysis can alleviate some use cases; it would also be enlightening to what harm using it might cause for things like forwards compatibility. As @GeoffreyBooth states above, if the algorithm is rigidly documented any failure to satisfy automatic generation of bindings are an exercise of reading the documentation to understand why something failed and not an attempt to infer guesswork.

As a slight aside, I think we should avoid calling things/people the "enemy" or implying that something is inferior by nature (good vs perfect). I'm not clear on those statements on how they are actually making any claims about the arguments at hand and it is a bit confusing to read. If such a catch phrase is used can we explain the positions of why something is a better trade off more clearly and without implying that other trade offs are in some way evil?

@bmeck it actually matches the definition of guess:

estimate or suppose (something) without sufficient information to be sure of being correct.

The suggestion is that we guess the properties of module.exports, using some sort of syntactic analysis.

Can you explain what the necessary qualification of things to be considered not to be guesswork?

For any CJS module, getExportsStatically(module) is equal to Object.keys(require(module))

@devsnek

The suggestion is that we guess the properties of module.exports, using some sort of syntactic analysis.

It does not claim to do so nor does it claim to do so completely, it only claims to create bindings if assignments in the special forms listed are found. In particular it was stated:

parse the references of statically-defined properties of that object

I assume you are claiming this to be guesswork based upon "without sufficient information to be sure of being correct", however if we are only concerned with static analysis it seems we do not fulfill your definition of guesswork. Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?


For any CJS module, getExportsStatically(module) is equal to Object.keys(require(module))

So to be absolutely clear, since require(module) generates some properties at runtime that cannot be statically analyzed; therefore it is my assumption that you are against any static analysis based solution. Can you clarify in which cases generating from static information would be a hinderance rather than an additive feature? If the claim is that it is better to only have purely guaranteed behavior to match runtime because it has some ill effect to add these bindings that might be more easy to understand your concerns.

@bmeck

Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?

Because I can still import the other forms, even if the modules group tries to pretend they don't exist.

Can you clarify in which cases generating from static information would be a hinderance rather than an additive feature?

  • As I've said a few times now, it creates breaking changes to esm consumers from otherwise invisible changes to cjs module source.
  • People consuming modules have to either read the source or try to use a named import and see if it works to use this feature.

Because I can still import the other forms, even if the modules group tries to pretend they don't exist.

Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem. Are you stating if we did not allow access to non-static analysis members of CJS modules it would not be a problem since we would be limited to only having one form of access and not a super/subset problem for a special form vs normal form? We could take an alternate stance that since we cannot identify runtime based values we could ban anything non-static. This is actually an interesting potential alternative trade-off we could pursue since export * and dynamic assignments to exports are semi-rare. Would you have the same objections if we limited CJS exports to purely static forms so that we do not have a mismatch?

As I've said a few times now, it creates breaking changes to ESM consumers from otherwise invisible changes to CJS module source.

Yes, but I am unclear on how it create creates the breaking change, can you elaborate? Right now it seems that this would not be a breaking change any any way except adding bindings that did not previously exist.

People consuming modules have to either read the source or try to use a named import and see if it works to use this feature.

I do not believe so, they could use the REPL to see by importing a namespace and reflection, they could always use the normal form, they could import a namespace, or use docs as well. More pointedly, the same problem is true for all of ESM bindings that are exported and CJS itself to some extent; coders need to know how they can use dependencies by some means of understanding the module being loaded in both CJS and ESM and this is not changed by any algorithm. Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency?

It's worth noting that since module.exports can have properties that are not IdentifierNames, as well as that are Symbols, import * as ns from 'cjs'; Reflect.ownKeys(ns) can never match Reflect.ownKeys(module.exports), so I'm not sure that's actually a meaningful bar to set.

If you narrow it down to only worrying about own string data properties of module.exports that are IdentifierNames, then the only possible discrepancy left would be dynamically added IdentifierName properties, something which absolutely could happen, but in practice is likely to be somewhere between "exceedingly rare" and "never".

@bmeck

Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem.

Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".

I am unclear on how it create creates the breaking change, can you elaborate?

// module 1.0.0
module.exports = {
  a: require('./x/a'),
  b: require('./x/b'),
  etc
};
// module 1.0.1
for (const name of readdirSync('./x')) {
  module.exports[name] = require(`./x/${name}`);
}

Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency?

If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says module.a and module.b and etc are available for consumption. The documentation for available apis doesn't change during the version bump, because the available apis don't change. Yet in version 1.0.0, they are named exports and in 1.0.1 they are not. If that contradiction is not enough, you can compare it to an ES module's documentation (regardless of if the module is wasm or source text or whatever), which would obviously say what the exports are.

@ljharb that problem is bigger than cjs (wasm exports, for example) so i'm not sure its worth worrying about that much in this context.

@devsnek

Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".

If they are importing an ESM module don't they have the same issue of needing to know what bindings are exported? I'm confused on why needing to know the exported bindings of a CJS module is problematic but ESM requires it.

for (const name of readdirSync('./x')) {
 module.exports[name] = require(`./x/${name}`);
}

Thanks for the example. So your concern is that people may accidentally create breakage by moving to more dynamic forms of code in some non-major version. We have seen some issues of this with how package.json#exports can often create breaking changes if the understanding of how the feature works is poor. Perhaps since this was acceptable for package.json#exports we could try and better understand what the difference there was an if we can also make a similar trade-off that we did if it is ameniable.

If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says module.a and module.b and etc are available for consumption. The documentation for available apis doesn't change during the version bump, because the available apis don't change. Yet in version 1.0.0, they are named exports and in 1.0.1 they are not.

I don't understand the point about it being contradiction, if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do. Stating that creating exports using a static form flagging binding names is quite similar to how the export statement also flags binding names. It might be good to have a lint rule for this, and for module authors to test their changes, but it doesn't seem like something that is unable to be understood, tested, or even done through feature detection.

If that contradiction is not enough, you can compare it to an ES module's documentation (regardless of if the module is wasm or source text or whatever), which would obviously say what the exports are.

I don't understand this comparison, this algorithm is defining a special form of static syntax. It defines what the named exports for ESM are by using that form. As @weswigham points out there is precedent and tooling relying on similar behavior. Comparing WASM and Source Text Module Records also define their exports for ESM using static syntax. How is using a static syntax in CJS different from the fact that ESM also uses static syntax?

How is using a static syntax in CJS different from the fact that ESM also uses static syntax?

if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do.

Isn't the entire point of this issue to figure out what to do about all the cjs that won't be updated, or will be updated but won't be transitioned to support esm? If someone is updating their package and wants to support esm (reads our documentation, for example) we should tell them to use an esm wrapper file. @addaleax even created a tool to do it for you: https://github.com/addaleax/gen-esm-wrapper. We shouldn't tell them to write their cjs code in a way such that node can tease details from the ast.

Given that, it seems like the real concern is existing packages, not new ones, since the new ones should be using an esm wrapper.

Would it be worth partially surveying the npm registry itself, to do a comparison between the runtime "named exports" and those static analysis derived?

FWIW I've opened a PR to improve error messages for the lack of named exports from CJS modules as a way to improve experience in the mean time.

https://github.com/nodejs/node/pull/33256

Isn't the entire point of this issue to figure out what to do about all the cjs that won't be updated, or will be updated but won't be transitioned to support esm?

Yes, and no. From my point of you it is more importantly about the feasibility of being able to port to ESM from faux modules. If you must port an entire dependency graph, that is much more painful that porting parts of your subgraph.

For example, if there is a module graph as follows:

Name | Author | Notes
---- | ---- | ----
A | Avery | Top level app. CJS: Faux Modules, named imports from B and C
B | Dakota | Dependency with a dependency. CJS: Faux Modules, named imports from C
C | Pat | Leaf Dependency. CJS

The path to removing transpilation is to get A and B off of a transpiler. Since A and B and C are authored by different people this introduces some ordering concerns.

C porting to ESM

  1. If C updates to real ESM the faux modules are unable to obtained named imports any more. Therefore an implementation in CJS must remain available (perhaps even routed by package.json#exports on the same deep specifier)
  2. If C duplicates their implementation in both ESM and CJS that means that various things like instanceof fail to function as seen in the React/GraphQL issues a while back.
  3. If C is effectively bound by consumers to keep a CJS implementation around then their ESM implementation must remain a wrapper around it in some form if they support loading as CJS.

All of this summarizes as updating C to be real ESM is quite breaking for consumers of C. Therefore the only safe path for C is to keep implementation in CJS and wrap CJS in ESM.

B porting to ESM

  1. If B is unable to use named import form to pull bindings from C it must update all import sites and this can affect existing tooling and lead to a loss of static analysis and degradation of features of tools like tree-shaking in bundlers. Since tree-shaking is very important to a variety of libraries, it might be taken as a sign to not move to ESM.
  2. B suffers the same problems as C where consumer breakage occurs.
  3. If C updates to ESM after B you get to revert all the non-named import forms; so, it is best to wait for C to update to using ESM. This encourages migration to be leaf first deep dependency migration which is a much larger timeline that changing B (which must have a CJS backed implementation anyway).

A porting to ESM

  1. Similar issues to B
  2. This at least can port all the named imports to be default imports for shallow dependencies. It gets to revert them though as they update.

By alleviating some of the back and forth of porting named imports we can at the least allow for fewer steps in porting implementation code over to ESM for top level apps. Additionally, we can avoid dependencies with their own dependencies from needing to wait on leaf dependencies to upgrade to keep tree shaking as well as reducing code churn.

@bmeck The consumer breakage concern means that named imports don't actually matter; anyone who wants to avoid breaking people will keep their implementation in CJS, with an ESM wrapper (or, if it's stateless, duplicated, with or without a build tool) for the foreseeable future. The thing that would impact what you're talking about is require('esm'). Additionally, tree-shaking either can't work on CJS regardless, or, is advanced enough that it would work the same on a destructured default import, so I don't think that's a particularly compelling argument.

So to summarize, the ultimate ask here is to enable named exports from CommonJS into ESM, e.g. import { someName } from 'commonjs-package'. The original post discussed achieving this via out-of-order CommonJS execution, but it seems like there are several folks with strong objections to that because of spec concerns and CommonJS primacy concerns (especially as related to instrumentation).

One option that seems to still potentially be viable is to attempt a static analysis of the CommonJS package to determine its exports that way, with the understanding that dynamically-named exports and unconventionally defined exports will be missed. There are concerns to this approach around that it doesn't necessarily catch all CommonJS exports, even if we document that it isn't trying to support _all_ possible CommonJS exports; and that what exports are discovered might change over time if we change the static analysis algorithm.

The one other option I remember ever being discussed was _double_ execution of CommonJS modules, with the initial execution during the linking (?) phase being only for the purpose of determining the exports. This was dismissed because of the possibility of side effects, e.g. a file being created or deleted twice or text being printed to the console twice, etc. However it occurs to me that there might be a way around that problem: what if there was a way to evaluate CommonJS code _without_ causing side effects? Like if you could evaluate a CommonJS package but sandbox its access to the file system or network, and throw away its STDOUT/STDERR, that would be a way to evaluate it without causing side effects. I don't know how or if this could be possible within Node, but I remember discussing the possibility of package managers adding an ESM wrapper to CommonJS packages via something like evaluating the package within a Docker container. Is there a way to do something similar in Node?

Even that has a caveat in that some packages might behave differently in such an environment versus the unrestricted access-to-everything environment that Node normally provides, so there would still be the possibility of inconsistency between the exports available in an all-CommonJS environment and the ones discovered for use in ESM. Ultimately I think the way to settle this is for someone to write the code for either of these approaches, and try it out on the top 1000 or so npm packages, comparing the detected named exports via this new method against what's found via Reflect.ownKeys(require('commonjs-package')). If there are lots of packages where the two approaches differ, then the new method is too unreliable to use (or more precisely, nonstandard ways of defining CommonJS exports are too widely used for that method to be considered reliable enough for common use). But hopefully that won't be the case.

Ultimately the question isn't whether a detection method is good enough, or if we're willing to accept less than 100% accuracy; it's a question of which user experience is the least bad of our options. The _current_ user experience, where only the default export is supported from CommonJS packages, is pretty bad and people are complaining about it. Would a different user experience that supplied named exports but with exceptions be worse? (Obviously, it depends on the exceptions and how many there are and how common each exception is.) I think the only way to reasonably answer this question is to write the code and run it on as wide a dataset as possible, so that we can make an informed decision. Having the data would also provide us with a good answer to users when they come asking why they can only get a default export, or why dynamic exports aren't supported, or why ESM has whatever incompatibility they find as compared with pure CommonJS or a transpiler environment.

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

I mean, _arguably_ they already can, as they may have been cached by another module graph which executed prior to the currently resolving one.

Right, that's (one of the reasons) why I ask.

It should be noted that a few years ago while google was trying to ship ESM and testing the viability of it in real world productions workflows in the browser, they wanted to do out of order evaluation of purely ESM. They have a document at https://docs.google.com/document/d/1MJK0zigKbH4WFCKcHsWwFAzpU_DZppEAOpYJlIW7M7E/view ; however, as the document explains they chose not to do so for a variety of reasons. This also came up again this year with a similar result.

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

I think the problem is less that CJS may execute before ESM but that it _always_ executes before any ESM file may execute in the graph. That means that certain kinds of code (bootstrapping, polyfills) can only ever be written as ESM if there's not a single CJS file involved in the entire program. Which is... a tall order.

Removing from module agenda for right now, feel free to re add

@guybedford are we ok closing this?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mhdawson picture mhdawson  ·  5Comments

arlac77 picture arlac77  ·  3Comments

MylesBorins picture MylesBorins  ·  4Comments

MylesBorins picture MylesBorins  ·  4Comments

Jamesernator picture Jamesernator  ·  4Comments