Currently, the experimental ESM loader supports file: backed URL based loading. It does not allow other schemes. It would be good to review what we should be concerned about when going through the Node.js loading mechanisms and how we can discuss support for other protocols, in particular data: and https:.
I've gathered some slides around this related to the experimental policies implementation.
These slides can be framed under a few hypotheses:
Given that mindset I believe that https: and data: based loading should be doable from a security perspective, but would like to have some review. There is however a slight difference from CJS loading due to the deterministic loading of modules when loaded via ESM. This means that modules are permanently in a map once loaded and has a more reliable way to ensure that a module reference is a singleton than in CJS which has a mutable require.cache leading to an increased concern about if a module reference can be recreated before another module uses it; note, this is similar to what happened in event-stream and also applies to CJS usually.
With all that in mind, I was wanting to gather any problems there might be in supporting https: and data: loading from the ESM loader. If we can agree on solutions or concessions I would like to PR core with https: functionality at least.
We have had this brought up a couple of times on the Realms calls via TC39 and no major pushback there as well except some of the concerns listed in the slides which were not enough to be considered objectionable. If nothing is apparent security wise that seems a problem, we should move onto other interested areas of Node.
CC: @nodejs/modules
data: I am ok with but I don't think we should _ever_ have _default_ unchecked networked module loading functionality.
Edit: i.e. if you _had_ to provide a shasum, maybe that would be ok.
@Fishrock123 can you explain relative to the slides what is different about networked access here vs generated by code? e.g. Why is using a trusted transport (including the CA checks that node does by default) not sufficient but using runtime codegen without integrity checks is?
@bmeck can only speak for myself: for the same reason subresource integrity is useful - I might trust my own code more than the CDN.
Imagine the following scenario:
@benjamingr that is an argument for policies to allow these URLs, but doesn't seem to necessitate that these URLs be unusable without being in the policy? Without integrity policies code in general isn't safe from mutation as we saw in event-stream where one package mutated a different one purely using file based modules.
@bmeck you are right, package-lock files mitigate this for modules where you don't accidentally update a module. An SRI like mechanism for URL imports would essentially be solving the same problem.
@benjamingr this integrity check already exists in an experimental flag via a file manifest (see a blog post) but isn't using package level data as that would require calculating the integrity of all files within a directory on startup and might not be valid if there are compile steps on the local machine like for C++ modules, if anything needs to be added to that it should probably be done against core as a separate issue.
Most helpful comment
data:I am ok with but I don't think we should _ever_ have _default_ unchecked networked module loading functionality.Edit: i.e. if you _had_ to provide a shasum, maybe that would be ok.