Html: Yielding to the event loop (in some way) between module evaluations

Created on 28 Feb 2019 · 60Comments · Source: whatwg/html

This is an issue to discuss a proposed, but still vague, normative change to module evaluation. This would be done in collaboration with TC39 (although it isn't strictly dependent on any changes to ES262).

If you want extra background, please read JS Modules: Determinism vs. ASAP. In terms of that document, the status quo is the super-deterministic strategy. This thread is to explore moving to (some form of) a deterministic-with-yielding strategy. We're not considering the ASAP strategy.

See also some discussion related to top-level await: https://github.com/tc39/proposal-top-level-await/issues/47#issuecomment-467898555

If you don't want to read those background documents, here's a summary. Currently module evaluation for a single module graph (i.e. a single import() or <script type=module>) happens as one big block of synchronous code, bound by prepare to run script and clean up after running script. The event loop does not intervene in between, but instead resumes after the entire graph has evaluated. This includes immediately running a microtask checkpoint, if the stack is empty.

Side note: for module scripts, the stack is always empty when evaluating them, because we don't allow sync module scripts like we do classic scripts.

This sync-chunk-of-script approach is simple to spec, but it has two problems:

Arguably, violating developer expectations. If you have two classic scripts, <script></script><script></script>, a microtask checkpoint will run between them, if the stack is empty. (And maybe sometimes more event loop tasks, since the parser can pause any time?) Example. But if you have two module scripts, import "./1"; import "./2";, right now we do not run a microtask checkpoint between them.

Stated another way, right now we try to sell developers the intuitive story "microtask checkpoints will run when the stack is empty." But, between evaluating two module scripts, the stack is definitely empty---and we don't run any microtask checkpoints. That's confusing.
It increases the likelihood of jank during page loading, as there is a large block of script which per spec you cannot break up the evaluation of. If it takes more than 16 ms, you're missing a frame and blocking user interaction.

The proposal is to allow yielding to the event loop between individual module scripts in the graph. This will help developer expectations, and will potentially decrease jank during page loading---although it will _not_ help total-time-to-execute-script, it just smears it across multiple event loop turns.

What form of yielding does this look like, exactly? A few options.

Run a microtask checkpoint only.
- This helps the top-level await case, if you followed the link to that thread.
- This helps developer expectations about microtask checkpoints happening on an empty stack.
Yield to the event loop by queuing a task for the next module evaluation
- If we go beyond just microtask checkpoints, this is the most idiomatic approach
- We could add a dedicated task source for this which would allow browsers to prioritize it appropriately: e.g. they could always prioritize this task source first, to give something similar to the current semantics, or always prioritize it last, which is equivalent to draining the task queue entirely before doing any module evaluation
Allow the browser to choose between just-microtask-checkpoint, and queuing a task
- The difference here is that queuing a task, even on the highest-priority task source, unavoidably brings in other steps. The big ones I see are long-tasks reporting and rendering every 60 Hz (including animation-frame callbacks/intersection observers/resize observers).
- This is meant to allow implementers to more flexibly tradeoff against the costs of going from JS engine -> event loop and back, especially for large graphs of smaller modules, where too many such transitions might be costly.
- This kind of optional behavior is kind of bad in general though.

Note that with top-level await, we'll likely end up yielding to the full event loop anyway at some point, e.g. if you do await fetch(x) at top level, a certain subgraph of the overall graph will not execute until the networking task comes back and fulfills the promise. (This also means a module can always cause a yield to the event loop with await new Promise(r => setTimeout(r)) or similar.)

I think the biggest option questions are:

Is this a good idea?
How should we reason about the expense of the "full event loop" vs. just microtask checkpoints?

/cc @whatwg/modules, @yoavweiss

normative change script

Source

domenic

👎3 👍3

Most helpful comment

WebKit team at Apple had some internal discussions. In summary, we don't think yielding during module loading as proposed is a good idea. That is, we oppose the current proposal.

First off, because WebKit doesn't even render the page until we did the first layout which doesn't normally happen until all module scripts are executed (unless we hit the timeout during the parsing of the main document, which is rare on most high profile websites), there is basically no end-user benefit for us. Secondly, yielding back to event loop even periodically during a module loading would incur additional performance overhead. So in WebKit, the proposed behavior change would almost always result in performance regressions but almost never performance improvements.

And most importantly, the proposed behavior seems to violate the expectations of module scripts authors have more than follows them. Web developers expect for microtask checkpoints to happen between two different occurrences of <script type="module"> but not within a single script between import "xxx" and import "yyy".

rniwa on 19 Mar 2019

👍5

All 60 comments

Overall, this idea sounds interesting to me, just wanted to verify some things about practicality.

Performance-wise, will it be scalable to yield in these cases? If we may eventually have pages with hundreds of small modules (made practical by low networking costs due to improvements in HTTP and/or bundled exchange), would it cause significant overhead on startup to perform extra microtask checkpoints or event loop turns? (See also: https://github.com/tc39/ecma262/pull/1250 , where reducing the number of microtasks improved throughput signficantly across engines, but that was many more microtasks than we are talking about here.)

How high do we expect the compatibility risk to be for this change? It's observable across browsers that no one yields in these cases today, but I don't know if people are depending on this property.

littledan on 28 Feb 2019

If we assume that most modules are fast to evaluate (just a few func defs and export), I'd want to be careful to introduce "we must yield" steps since task runner / microtask in Chrome has non-trivial scheduler overhead.

Casual Q: Would it be feasible to have it so that the UA can choose when to yield? or do you think this is a bad idea due to non-determinism?

nyaxt on 1 Mar 2019

Note that top-level-await (where this discussion originated) effectively does exactly provide the user with a way to choose when to yield. This does seem more flexible to me than doing it for all modules.

guybedford on 1 Mar 2019

👍2

I must admit I am not sure I understand all aspects of it but at first glance this looks to me like a really subtle way of breaking existing code bases (if not the web, but people are using bundlers today that will hide these issues) that rely on a predictable, synchronous execution order between modules. Example:

// file A.js, entry point
import {valueB} from './B.js';
console.log(valueB);
export const valueA = 'A';

// file B.js, this will be executed first
import {valueA} from './A.js';
// this will explicitly yield to the event loop to allow A to finish execution
Promise.resolve().then(() => console.log(valueA));
export const valueB = 'B';

As I understand this proposal, it will break this example and similar ones where either circular dependencies are involved or a module depends on something happening in the synchronous initialisation code of its dependent modules.

Of course being current maintainer of RollupJS, the bigger issue I see is either breaking parity between bundled and unbundled ES modules or breaking core optimizsations such as scope-hoisting in interesting and subtle ways. IMO these issues should rather be solved explicitly by using things like dynamic imports.

lukastaegert on 1 Mar 2019

👍5

Or of course top level await.

lukastaegert on 1 Mar 2019

@nyaxt If we want to let the UA choose whether/when to yield, I think we'd probably want to adopt some form of https://github.com/tc39/proposal-top-level-await/pull/49 for top-level await, and not depend on yielding as a mechanism to enforce ordering as previously proposed.

littledan on 1 Mar 2019

Perhaps unsurprisingly, since I'm coming from the same tooling perspective, I share @guybedford and @lukastaegert's concerns. Unless bundlers change their behaviour in ways that I think users would bristle at (and find highly surprising), we risk creating a situation where the behaviour of your app differs between development and production in unexpected ways.

I think I'd also push back a little on the claims about developer expectations. I suspect that if you asked devs 'do microtasks run between <script> blocks? a) yes b) no c) I never thought about it', the majority would answer c), with the remaining votes split between right and wrong answers.

There is widespread awareness nowadays that we should avoid parsing and evaluating large chunks of JS, but between TLA and dynamic import there are already ways to solve this problem that are explicit, easily understood and tooling-friendly.

Rich-Harris on 1 Mar 2019

👍3

@littledan Would it be feasible (and useful) to adopt the ASAP policy for web assembly modules?

bergus on 1 Mar 2019

I think people who want to use tools to get large-blocks-of-script can continue to do so; they actually free us up to make progress with different and better strategies on the web, and we should not be tied down to their large-block-of-script semantics.

domenic on 1 Mar 2019

👍1

@bergus Given that WebAssembly module instantiation is highly observable (though the start function), I think it's important to give it the same kinds of guarantees as JavaScript execution when modules start.

It's possible that the start function might not be used very much in practice, instead with a convention to use exported init() functions, so maybe I'm overestimating this concern. But @rossberg has raised the importance of the atomicity of the start function recently in the WebAssembly CG, so I'm thinking that we should keep making sure it has nice properties.

littledan on 1 Mar 2019

@domenic Would you encourage people to use tooling long term to get large blocks of script, to avoid the performance issues @nyaxt mentions, if they want to use small modules?

littledan on 1 Mar 2019

I wouldn't want to make a recommendation until we have a sense of the performance impact. Indeed, I think that's really what's needed to move this discussion forward, is finding some real-world websites using modules, and prototyping a quick change that at least runs a microtask checkpoint and benchmarking before/after.

My hope is that this doesn't become a performance issue, so that this only becomes a semantic choice. People who want the semantics of large blocks of script (i.e., runs top to bottom no yielding, potentially janking the page) can use tools to create those semantics. Whereas people who want to the semantics of small modules, can leave them un-compiled.

domenic on 1 Mar 2019

👍1

I wouldn't want to make a recommendation until we have a sense of the performance impact. Indeed, I think that's really what's needed to move this discussion forward, is finding some real-world websites using modules, and prototyping a quick change that at least runs a microtask checkpoint and benchmarking before/after.

In addition, we (as a whole) can't make a reccomendation either if the introduction of this feature degredates the performance of bundled JavaScript.

I have no concern with attempts at making native modules faster then bundling today. But the introduction of a feature that makes modules faster by making bundled code _slower_, in my opinion, is unacceptable.

Therefore I expect before/after assesments of performance for bundled code (I'll be happy to provide you with multiple examples in the wild) to ensure that bundled code see's _zero performance regressions_ for this normative change to be remotely acceptable.

TheLarkInn on 1 Mar 2019

This fix is unrelated to bundled code.

domenic on 1 Mar 2019

Bundlers aim to implement at compile-time what the runtime semantics would have been without the bundling. Changing runtime semantics (should) change bundler semantics.

As you did mention, though, tooling can still choose to "sync-ify" things by inlining them.

I don't understand how the microtask checkpoint would help with interactivity, though. The control still isn't yielded from the javascript host back to the renderer.

Jessidhia on 1 Mar 2019

👍1

Sure, bundlers can decide to follow JS semantics or follow bundled semantics; that's up to them and their users. But fixing this bug in the spec will not change how bundled code behaves. We can easily deploy this feature, and improve (? pending perf tests) websites which use native ES modules, without impacting websites that use bundled code.

domenic on 1 Mar 2019

😕1 👍1

Casual Q: Would it be feasible to have it so that the UA can choose when to yield? or do you think this is a bad idea due to non-determinism?

This was my last option, or I guess maybe you're proposing even a variant of that, which makes the microtask checkpoint optional too?

I think it's not great, and especially I'd like to get microtask checkpoints fixed. But in the end we do need to spec something implementable. And we have some precedent with the HTML parser, as something which is allowed to yield at arbitrary times in an optional fashion. So I am open to this.

Do you have a sense of how hard it would be to prototype (a) just a microtask checkpoint, (b) a full yield to the event loop? Then we could try it out on some sites.

domenic on 1 Mar 2019

Empirically, the right answer doesn't seem to be either large-block-of-script or many-tiny-modules, but somewhere in between (which is what modern bundlers generally strive for). Getting the best performance is always going to involve tooling, so a proposal centered on performance ought to consider feedback from that constituency.

Using the term 'bugfix' implies that the current behaviour is a bug — I don't think that's been established.

Rich-Harris on 1 Mar 2019

👍4

Note that code-splitting as a bundling feature will introduce arbitrary transitions between the two semantics.

lukastaegert on 1 Mar 2019

👍1

Using the term 'bugfix' implies that the current behaviour is a bug — I don't think that's been established.

I mean, as the person who wrote the spec, I can definitely say it's a mistake and a bug that module scripts are not consistent with classic scripts. At least the microtask checkpoint work is a bugfix.

domenic on 1 Mar 2019

Note that code-splitting as a bundling feature will introduce arbitrary transitions between the two semantics.

Yep, I think that is exactly @Rich-Harris's point that tooling will be useful to allow an app to choose between the two semantics as appropriate for their scenario.

domenic on 1 Mar 2019

That's not my point. I think it's unrealistic to expect developers to understand the distinction; we should strive for a situation in which browsers-doing-the-right-thing and bundlers-doing-the-right-thing leads to the same observable behaviour, so that developers don't face surprising bugs in production.

For clarity, I'm not saying that tooling should drive these considerations, just that a proposal designed to maximise performance should be mindful of the the fact that performance maximisation is achieved via tooling. If bundlers change to match the proposed semantics, it will cause performance regressions in some apps.

Rich-Harris on 1 Mar 2019

👍3 ❤1

@domenic Your benchmarking idea sounds like it'd help work through this. I heard @kinu had some pretty exciting benchmark results for bundled exchange, which might give a more forward-looking complement.

In general, I don't think new features should be limited to what can be implemented in tools, but I was looking forward to the future where small modules could be shipped directly to the web with webpackage.

(I am confused about why this is considered a bug when we discussed this possibility in 2016 before it shipped in browsers (in particular, I suggested in a comment in this document you linked to considering a microtask checkpoint in this exact situation); I thought the choice was deliberate.)

Edit: To clarify, I agree with what's said above that, when possible, it's optimal if we can work towards alignment between tools and browsers. Tools can help lead to faster adoption of the native implementation of new features in browsers, bridging the gap when some browsers take longer. I'm worried that that bundlers and browsers will face similar performance constraints when it comes to working with small modules (and if browsers have a faster path to queueing a task, well, maybe that's a new web API to expose).

littledan on 1 Mar 2019

Sorry the "casual q" is without any data so +1 on judging based on measurement. The changes needed to measure this should be trivial.

nyaxt on 5 Mar 2019

We should absolutely measure this but my guess is that microtaskq flush may be doable, while yielding to event loop is challenging (will likely require years of Blink optimization efforts to enable).

nyaxt on 5 Mar 2019

tl;dr Would it make sense to yield to either the event loop or microtask queue after each JavaScript module load? Yielding to the event loop could help keep things running smoothly while code is setting itself up, but on the other hand, it could be expensive and break (possibly foolish) atomicity expectations.

cc @smaug---- @rniwa @dbaron

littledan on 19 Mar 2019

From the the UA point of view microtasks are synchronous. So microtasks wouldn't help with the jank issue at all.
Some truly asynchronous approach sounds like something which could help devs to write less jank-y pages. But then, how should this work, if some task for example modifies the contents of the currently executing ´

Currently, the logic for microtask checkpoints seems simple and consistent: When returning from JavaScript to HTML, there's a microtask checkpoint. This checkpoint is pretty necessary: We're done running the JS, but there's a bunch of other JS to run that's queued up at higher priority than HTML task queues, so we'll run that first.

The checkpoint you're proposing is between various pieces of JavaScript code. I don't quite understand the intuition where consecutive import statements should each get their own microtask checkpoint--we'll surely have time later to run these queued items, when we're done running JavaScript code. Why should queued microtasks run at higher priority than the next module?

Is this a proposal we should be looking into at the HTML level or the JavaScript level? Currently, HTML defers module graph execution to the JavaScript specification; I'm not sure what problems that layering might be causing.

littledan on 20 Mar 2019

When returning from JavaScript to HTML, there's a microtask checkpoint.

That's not accurate. We insert them whenever the stack is empty (with the exception discussed here). This includes between pieces of JS script. This is done via the Web IDL bindings.

As one concrete example, microtask checkpoints run between two event handlers, when those event handlers are invoked with an empty stack.

Why should queued microtasks run at higher priority than the next module?

The point of microtasks, as originally designed, is to run whenever the stack is empty. There are several ways to do this; separate script tags; separate Web IDL callbacks; etc. It seems like a clear omission that separate imports were not considered.

Currently, HTML defers module graph execution to the JavaScript specification; I'm not sure what problems that layering might be causing.

I think it's causing exactly the problems identified here, which is that an invariant we've worked hard to preserve in the Web IDL and HTML levels was not accounted for by the JS spec.

domenic on 20 Mar 2019

Let's put the full event loop flush aside; no implementers seem to want that (besides maybe @yoavweiss who raised it to me a couple months ago and has not since responded to this thread).

Apologies for missing this thread when it was started...

To clarify, my questions were in the context of bundled exchanges, delivery of ES modules and the performance implications of that.

Theoretically, bundled exchanges have the benefits of consecutive execution of each module separately, where current bundling techniques execute the code as a single large blob. As such, the weakly deterministic strategy would be significantly better from a loading performance perspective, once we can use bundled exchanges for that.

It's not 100% clear to me how the "deterministic with yielding" proposal here relates to that, and if it would allow the browser to start executing the required modules that it fetched before downloading the entire graph. Is full event loop flush required for the weakly deterministic proposal?