Html: JSON modules

Created on 22 Jan 2019 · 65Comments · Source: whatwg/html

Hey All,

I'd like to explore support for importing json, similar to how Node.js currently support require('./some.json')

Expected behavior:

imported json would export an object or array with the content from the provided json file.

Why:

Currently the was to get JSON is via fetch + import.meta.url... and requires a bit of back and forth... the eventual result is a promise to resolve to the json object. Being able to statically import and parse json would allow us to import specific symbols, export symbols, and asynchronously get resources during the fetch phase.

What are the next steps?:

I'm assuming a spec change in the HTML spec, but unsure what else we would need to do. Thoughts?

additioproposal script

Source

MylesBorins

👍31 🎉2 🚀1

Most helpful comment

Mozilla is interested.

annevk on 6 May 2019

👍21

All 65 comments

So concretely, for those not familiar with Node.js's support here, this would mean that if you import something with a JSON mime type, you would get back the result as if JSON.parse were applied to the files contents (decoded as UTF-8). If it doesn't parse as JSON, then the module ends up in the error state, similar to JavaScript parse errors.

As for next steps, the main one is implementer interest. /cc @whatwg/modules; does this seem like a reasonable addition?

domenic on 22 Jan 2019

👍5

I think it is reasonable, but would note that this is for static data, whereas fetch() would get new data each time it is used; import() will not get new modules for each use. As long as this is well understood it seems good for a variety of usages that we see in bundler usages.

bmeck on 22 Jan 2019

👍3

I'm happy with certain well-known and commonly-used content-types getting good module support, even if it's just a "trivial" wrapper around a constructor or function call. JSON, CSS, etc are all good.

tabatkins on 22 Jan 2019

This came up at the last W3C TPAC too, I raised https://github.com/w3c/webcomponents/issues/770 at the time. If the complexity is indeed branching on a JSON MIME type and then using parse JSON from bytes I think Mozilla would be supportive. If we also want to advertise support to servers in some manner the discussion might become a little more involved. (It seems apart from an object or array this could also return a primitive, by the way.)

annevk on 23 Jan 2019

I don't know if it matters to the standards process, but there is some discussion of the topic already. If you look at the answers on that SO question (and at linked/related questions), you'll see there is some confusion as well. Because many (most?) people writing in ES6 or later end up transpiling via Babel -- which means using a module loader provided by the transpiler, rather than the browser-native one -- they can write import { x } from "foo.json" today, and as far as they are concerned, it works. Presumably this is because the Babel runtime loader is actually more advanced than the browser native implementation.

thw0rted on 23 Jan 2019

👍1

There are some complexities if we are wishing to exactly match tooling as it exists today.

To note, tools supporting JSON as an importable module type do allow both named and default exports. The JSON provided always assigns the result value to the default and then "picks" the relevant Object properties to expose as named exports e.g.

{"x": {}}

export default {x: {}}
// export x = default.x //

This kind of aliasing where updating default.x updates x isn't available to Source Text Module Records, but that is probably fine.

Additionally, for non-Identifier properties, the only way to access values would be through the default export due to being unable to import non-Identifier bindings.

import locales from './locales.json';
locales[navigator.language];

I think the simplest solution of not supporting named exports would be fine for now and encourage a more uniform usage.

bmeck on 23 Jan 2019

👍3

Conceptually a json file is a single thing; that babel allows named imports from it is merely a product of it conflating destructuring with named imports. I definitely agree json should be confined to default import.

ljharb on 23 Jan 2019

👍6

What are the next steps for getting this off the ground? Spec Text? Implementor interest?

I'm happy to help drive this but need some mentorship on the process

MylesBorins on 23 Jan 2019

Implementer interest would be ideal, otherwise spec text might be wasted effort. But, you could start spec and test work ahead of implementer interest, if you're OK taking the risk.

I'm happy to mentor on spec/test work when you're ready; let's discuss in a more high-bandwidth medium like IRC or offline.

domenic on 23 Jan 2019

I will be at tc39 next week and can work on getting some implementer
interest.

MylesBorins on 23 Jan 2019

👍1

Conceptually a json file is a single thing; that babel allows named imports from it is merely a product of it conflating destructuring with named imports. I definitely agree json should be confined to default import.

As background information why this complexity was introduced: aliasing the top-level JSON keys to named exports makes the JSON file tree-shakable on the first level. So if only x is used in the application, the rest can be removed. This can make a notable difference if the JSON file is big enough.

jhnns on 4 Feb 2019

❤1

Tree shaking isn't really a good argument for named exports anymore. Build tools can check statically analyzable member expressions and determine if an object binding escapes the static analysis.

guybedford on 4 Feb 2019

👍8 🚀1

Let's consider multiple modules that depend on the same remote source of static JSON. With today's approach of having each user fetch & JSON.parse the result, they get their own object which is pristine.

If the module loader is now holding a shared mutable copy of the data, how can a single user/callsite confidently access the original data?

Will there be a special way to request the untainted copy? Or should we consider JSON modules being deeply frozen by default?

robpalme on 5 Feb 2019

If anybody can come up with a reasonable use case for mutable JSON imports, I'd like to hear it, but otherwise freezing seems like a simple solution.

thw0rted on 5 Feb 2019

👍2

Tree shaking isn't really a good argument for named exports anymore. Build tools can check statically analyzable member expressions and determine if an object binding escapes the static analysis.

These checks are very difficult and I think there are a lot of situations where the object binding would escape. Using named imports is certainly a lot easier to statically analyze. But I also see that allowing named imports of first-level JSON keys kind of abuses named imports for the sake of static analysis, so I'm not a strong advocate of this feature. Just wanted to give some background information.

Will there be a special way to request the untainted copy? Or should we consider JSON modules being deeply frozen by default?

I also thought about this. I would definitely prefer freezing it but I also don't see a strong reason why the host environment should enforce this. Maybe security reasons?

jhnns on 5 Feb 2019

Freezing sounds very sensible. Note that freezing would also help static analysis because then even in the case of a reference escaping the analysis, member expressions can still be inlined.

guybedford on 5 Feb 2019

👍2

an alternative to recursively freezing might be if the import was a thunk - a function that returns a new mutable object.

ljharb on 5 Feb 2019

👍1

I don't think this is the place to introduce any kind of freezing or recursive freezing into the web platform. The platform is full of shared mutable objects, e.g. window, or JS modules' namespace objects. There's no reason to treat JSON modules' namespace objects specially.

As always, if you want to create a frozen copy of one of these shared mutable objects, your code needs to run first, and do the freezing itself.

domenic on 5 Feb 2019

👍10 🚀1 🎉1

@domenic your argument seems to be nothing more than to state the status quo as canon. I'd have hoped for a more convincing point here.

guybedford on 5 Feb 2019

😄1

Consistency is valuable. JSON modules should not depart from JS modules.

domenic on 5 Feb 2019

👍4

Perhaps this pushes the argument back towards being for named exports then, since that would ensure the consistency guarantees of JS modules? Although it does mean introducing a valid identifier filter unfortunately.

guybedford on 5 Feb 2019

No, JS modules that represent a single thing do use default exports, so the consistency is perfect there.

domenic on 5 Feb 2019

👍4

We are exploring an implementation in Node.js in https://github.com/nodejs/ecmascript-modules/pull/43

MylesBorins on 26 Feb 2019

👍1

Some conversation from the Node.js proposal

Folks seem to want the following behavior

export default only
mutable export
not recreated on each import (shared cache with singleton)

It seems like we have consensus in here on the first two points. What do people think about the cached singleton?

MylesBorins on 27 Feb 2019

👍1

Considering that's how all ES modules operate (multiple imports in the same graph do not cause re-evaluation of the contents), I can't conceive of how we'd do it any differently.

ljharb on 27 Feb 2019

👍5

Agreed. The parsed JS value gets stored in the module map.

domenic on 27 Feb 2019

👍6

Some conversation from the Node.js proposal

Folks seem to want the following behavior

export default only

👋 Node Module WG member and TC39 delegate here. FWIW I'm actually a fan of named exports support across the board (JSON/CJS/WASM/etc).

jdalton on 27 Feb 2019

👍1

Named exports would give special semantics to JSON object values (and maybe array values?), which seems really weird, if I understand the idea correctly. Treating it as a single unit seems better.

annevk on 27 Feb 2019

👍3

which seems really weird

As covered above it isn't seamless (e.g. some properties will be ignored because they aren't valid identifiers). The same could be said for named exports of other module formats like CJS. However, it's dev reality through Babel, etc. which is why I'm a fan.

jdalton on 27 Feb 2019

@jdalton what would you imagine should be the behavior for the follow
lol.json

{
 "default": "lol I'm a monster",
  "data": "Why not moar data?"
}

import {default, data} from './lol.json'

console.log(default);
console.log(default.data);
console.log(data);

import data from './lol.json'

console.log(data);
console.log(data.default);
console.log(data.data);

The first example breaks my brain a bit and seems to be a reason to consider default only

MylesBorins on 27 Feb 2019

Yeah it seems pretty clear that on the web at least we'll be doing single default export for JSON, CSS, etc. It's good that we've had experience with tools like Babel to guide us and show that named exports work poorly for these scenarios; let's learn from those mistakes, instead of repeating them.

domenic on 27 Feb 2019

👍1

@MylesBorins

what would you imagine should be the behavior for the follow

I'd handle it like the proposed way of handling CJS. The default export is the full JSON parsed value and the named exports are from those properties of the parsed JSON value that qualify as identifiers.

@domenic

It's good that we've had experience with tools like Babel to guide us and show that named exports work poorly for these scenarios

I donno about worked poorly. From a recent study in the Module WG over 30% of crawled CJS used in ESM used named exports. To me that points to working well.

jdalton on 27 Feb 2019

@jdalton was there data about using named exports from json specifically?

ljharb on 27 Feb 2019

@ljharb

was there data about using named exports from _json_ specifically?

No, though I'm guessing it could be gotten. The sample set was also of those packages containing a module field so the sample could be increased as well.

jdalton on 27 Feb 2019

I'd handle it like the proposed way of handling CJS. The default export is the full JSON parsed value and the named exports are from those properties of the parsed JSON value that qualify as identifiers.

The challenge with this is with a property named default. Current behavior in ESM would have import {default} from './module.mjs' throw with "SyntaxError: Unexpected reserved word". This would make default unable to be a named export... along with any other keys in the JSON that are not valid identifiers. This seems VERY likely to cause confusion with developers.

@jdalton I personally don't find the behavior that has been adopted by build tools as a compelling reason to define behavior. Those tools can continue to work they way they do, or alternatively make breaking changes to align with upstream.

Those number re: named exports did not have specific data regarding what type of module the named exports were coming from, as such I don't think it is entirely relevant to this conversation.

MylesBorins on 27 Feb 2019

@MylesBorins

The challenge with this is with a property named default. Current behavior in ESM would have import {default} from './module.mjs'

There is no challenge because it's handled the same way it would be with a CJS module. The default export is the raw export value and not a named property of the export value. It may help you to think of it in this way: In Node JSON is handled like module.exports = JSON.parse(content).

jdalton on 27 Feb 2019

👍1

@jdalton what I mean is that name exports would be limited to a subset of all potential names

MylesBorins on 27 Feb 2019

That’s already the case for CJS as well, to be fair. For me, it’s because conceptually a JSON file is one single value, and its properties were never intended to be used independently.

ljharb on 27 Feb 2019

what I mean is that name exports would be limited to a subset of all potential names

That’s already the case for CJS as well, to be fair.

Yep. For me it's a net positive 🤝

jdalton on 27 Feb 2019

Question about an edge case much finer than named exports (aside: Anything besides a single default export seems very strange to me):

If there's a JSON parse error, should this

be treated like a JavaScript parse error, and prevent all execution of the module graph, or
behave like JSON.parse as the top-level statement in the module's body, and only throw an exception when it's reached (potentially after other siblings)?

This question came up in https://github.com/tc39/proposal-javascript-standard-library/pull/44 , where @domenic is drafting some of the underlying specification text.

littledan on 28 Feb 2019

👀2

For JSON, is there a reason you want partial evaluation of the graph, or is there a reason it should not pre-allocate/parse things as they stream in?

bmeck on 28 Feb 2019

I don't think this question changes whether you could parse things as they stream in. Not sire what you mean by pre-allocate. I don't have any "use cases" in mind for why you would want one or the other error behavior; my thought was just, consistency with JS parse error handling might be good.

littledan on 28 Feb 2019

For JSON I would treat it as a JavaScript parse error. However, that doesn't necessarily translate to other module records, e.g., CSS and HTML.

annevk on 28 Feb 2019

With CSS and HTML being more error tolerant, that makes sense. I'm wondering, though, would we ever want to do parsing work, logically, during the "module evaluation" phase, or does it make more sense to consider it done up front, when constructing the module? I'm wondering whether the same sort of phase ordering would make sense for all four, or if there are differences between them (even if CSS and HTML don't actually throw lots of errors).

littledan on 28 Feb 2019

It seems useful to make linking as fast as possible; would a large JSON document make link-time parsing slow down the overall operation?

ljharb on 28 Feb 2019

@ljharb I'm a bit unclear, the entire graph time would largely be unaffected if we do a sync operation of parsing. However if we do incremental parsing prior to linking it seems there could be the possibility of some parsing gains if there is idle time prior to evaluation. Either way, the graph would have to parse the whole document.

bmeck on 28 Feb 2019

Sure; in the success cases the time would largely be the same, I’m more thinking about in the failure cases where it might be faster to parse in parallel instead of blocking at linking time.

ljharb on 28 Feb 2019

I don't understand how this choice would affect performance either way. Parsing JSON happens entirely at startup either way, and you can parallelize it with parsing other parsing work either way.

littledan on 28 Feb 2019

It seems like it could affect performance if we ever allowed deterministic with yielding evaluation of the module graph. I don't think you can chunk up instantiation (or can you?), but you can chunk up evaluation. Total time taken would be the same, but it would be spread out in a way that would cause less missed frames.

you can parallelize it with parsing other parsing work either way.

I think this statement might be a bit misleading, because by "parsing JSON" we usually mean the whole string -> JS object algorithm (like JSON.parse), and JS object creation is notoriously single-threaded. So, at least the object creation part of it would need to be done in a non-parallelized fashion, as part of the main instantiation or evaluation work.

domenic on 28 Feb 2019

@littledan
My thinking is that with something like:

import 'largeAndSlow.json';
import 'hasParseError.json';
import 'does not exist';

if json parsing is done as part of linking, then you'd have a delay, and then get an error on hasParseError, but if parsing is done as part of evaluation, you'd get an error on does not exist largely immediately.

(please correct me if i'm misunderstanding how any of this works)

ljharb on 28 Feb 2019

@domenic Thanks for the correction; I guess it'd be a bit unfortunate to do two passes over the JSON to separate out syntax error checking (at ParseModule-time) from creating the JS objects (at Evaluation time).

If we want to switch to the deterministic-with-yielding strategy, now's probably the time, as part of the top-level await proposal, and in particular https://github.com/tc39/proposal-top-level-await/issues/47 . It seems unfortunate to go through two different separate changes in how module evaluation works (if we first add a microtask queue checkpoint, and later yield to the event loop).

@ljharb That does sound like an observable effect. Do you think performance is important in this sort of error case?

littledan on 28 Feb 2019

In general we haven't been treating performance in error cases as important; I don't remember the specific design decisions that led to, but in the great module evaluation rewrite of 2016-ish that was taken as a given.

domenic on 28 Feb 2019

👍2

if json parsing is done as part of linking, then you'd have a delay, and then get an error on hasParseError, but if parsing is done as part of evaluation, you'd get an error on does not exist largely immediately.

Both seem to require fetching, and you still can error on the first error of hasParseError.json 's body; Even if largeAndSlow.json is 50MB of nested arrays [[[..]]] and hasParseError.json is empty with no body ``, you can do incremental parsing as follows and error very close to immediately.

If you get 4kb of content at a time, lets say we get 4kb of largeAndSlow.json of purely [ characters. You can queue allocating that many arrays (or you can wait for the full body to allocate).

If you then get the content of hasParseError.json. You can see the error immediately still since `can be fully parsed and errors out. This can error out prior tolargeAndSlow.json` finishing downloading.

If import 'does not exist'; did not exist to error even earlier, the graph could bail out after 4kb of largeAndSlow.json. The tradeoffs here are not super beneficial one way or the other though as doing incremental parsing of JSON really tailors to more compute as things come over the network and require more coordination. I think either approach is fine, but the trade offs are not simple to state.

bmeck on 28 Feb 2019

To get some clarification, is there any need to slow down linking if we don't provide named exports since linking shouldn't have observable side effect? We will know of the shape of the module to be {default} and that it has no dependencies. We would just need to ensure that error propagation happens prior to the graph starting evaluation.

bmeck on 1 Mar 2019

I haven't heard any proposed semantics which would allow the fetching to be lazy; I assumed we'd fetch the JSON when fetching the module graph, and we were just talking about when the parsing would happen.

I suppose fetching could be lazy, as if it's a top-level await in the execution phase to wait for the fetch to finish, but I don't understand why this case specifically deserves special treatment. Would we want to do this for all leaf modules which don't have named exports, just because we can?

Instead of giving this sort of idiosyncratic treatment to JSON, I'd suggest that developers use dynamic import for cases where they want more laziness in loading.

littledan on 1 Mar 2019

I do think we are talking about when parsing errors propagate, not necessarily when parsing happens (to my knowledge JSON parsing does not produce user observable side effects). I agree that both do require fetching eagerly still.

Instead of giving this sort of idiosyncratic treatment to JSON, I'd suggest that developers use dynamic import for cases where they want more laziness in loading.

I think if you want to error out during execution at the top level of a module this would also require top level await.

bmeck on 1 Mar 2019

@bmeck I don't understand what you're getting at. Do you have semantics in mind here that you'd prefer?

littledan on 1 Mar 2019

@littledan I think parsing errors should propagate during link time since link time could mean faster erroring out / no partial graph eval / matches JS; but, either semantics are fine to move forward with if we cannot do that since they both have slightly different tradeoffs.

bmeck on 1 Mar 2019

👍1

FWIW, a new feature of webpack@4 was support for JSON-as-modules.

When the JSON document is an object, properties are plucked (when they are valid identifiers) and made available as independent exports, with the default export being guaranteed to be the entire unmodified JSON.parse result (which may not even be an object). If you do actually use only named imports (other than default), it is capable of deleting the provably-unused parts of the object from the bundle.

Jessidhia on 28 Mar 2019

👍2

I realize this feature never seems to have gotten concrete signs of implementer interest. Blink is interested in implementing; any thoughts from Gecko or WebKit? @annevk @jonco3 / @rniwa @Constellation.

Concretely, #4407 has the pull request.

domenic on 4 May 2019

👍5

Mozilla is interested.

annevk on 6 May 2019

👍21

Microsoft is on board for implementing this in Blink.

dandclark on 7 May 2019

👍13

Should this issue be reopened since https://github.com/whatwg/html/pull/4943 reverted #4407 which closed this issue?

TimvdLippe on 7 Nov 2019

Once someone takes the time to resolve https://github.com/w3c/webcomponents/issues/839 I'm sure the conversation will find its way back to this repository somehow. Not sure we need a new tracking issue or reopen this one.

annevk on 7 Nov 2019

Seems like treating JSON file content as a default export is the only way to have consistency across all JSON features.

With JSON files containing content other than objects, for example

true

null

"foo"

[1, "2", true]

named imports don't work (obviously).

The only way to keep usage consistent with JSON features is a default export, then in their code people can choose to detect whether the import is a boolean, an object, or something else, or what properties they want to pluck from the default import in case it is an object.

import pkg from './package.json'
import child from 'child_process'

child.exec(pkg.scripts['build:dev'], ...)

If we don't mind being less consistent across JSON features, then the next best option is for default to work as above, but with named imports also available from JSON files that have object values (the same as @Jessidhia last mentioned).

But the question is, is consistency across the JSON features important? _Most_ JSON values are objects, and _most_ keys in those objects are valid identifiers.