See https://github.com/nodejs/node/issues/33460.
tl;dr: it turns out that there's a number of tools that need to be able to locate a package's root directory, and with "exports", they can no longer rely on package/package.json working, nor package/.
If we could provide this API, then CJS could path.join it, and ESM could use URL, to get whatever file the tool needs to fs.readFile.
Adding this new API would bypass questions about "should package.json be an implicit part of a package's API", as well as avoid reliance on ESM-only or CJS-only mechanisms. By providing the package root dir rather than its package.json, we would not be encouraging or discouraging any patterns of "where to store metadata" - instead, we'd be correctly leaving that choice up to userland.
Example solution: module.packageDir(specifier) and module.packageDirSync(specifier) (sync access is critical for many use cases).
How would the resolver source be determined? Maybe require.packageDir would be a better place for the API so could have access to the callers __filename? For example you have the following packages installed:
node_modules/pkg1
node_modules/pkg1/node_modules/pkg2
node_modules/pkg2
node_modules/pkg1/index.js asks for packageDir of pkg2, it needs to find the nested one.
Yes, that is true. It would need to do the searching contextually, just like require.resolve, so require.packageDir would be a better place to put it (and thus perhaps, for ESM, import.meta.packageDir)
As paths are URLs in ESM, "dir" is perhaps not the best name. But aside from naming, yes, this would be a good method to expose. In https://github.com/nodejs/node/issues/33460#issuecomment-630968792 I had suggested resolvePackageRoot.
I'm removing the modules agenda label here, as the discussion around this seems to be the same topic as #33460 which is still on the agenda as well.
Any plans to resolve this?
I had completely forgotten about this, but am still very much +1. Here's a way of going about doing this that may be worth considering. The constructor takes the specifier, resolves it, and populates the object with everything it knows about the package.
import { readDirSync } from 'fs';
const tapePkg = new Package('tape'); // resolves the identifier
const pkgDir = tapePkg.packageDir; // access the directory
console.log(readDirSync(pkgDir)); // log the resolved directory contents
Some research on other language approaches: https://docs.microsoft.com/en-us/dotnet/api/system.io.packaging.package?view=netcore-3.1
That seems a bit overkill.
All that I think is needed is Module.getPackageDir(packageSpecifier), and then anything needed from there can be done with fs APIs.
I haven't yet had time to make a PR, but anyone is welcome to beat me to it.
Module.getPackageDir(packageSpecifier) would need a specifier to know where it is starting resolution.
I also am unclear on how any API should integrate with redirection mechanisms like policies and import maps. I'd personally be fine ignoring redirection mechanisms but that could lead to problems where importing doesn't match the API.
We could add a method to the require function so it knows where to start resolution.
Or a function that takes two parameters. getPackageDir(__filename, packageSpecifier) in CJS and getPackageDir(import.meta.url, packageSpecifier) in ESM.
I’d indeed expect a second argument to the static method that took __dirname/import.meta.url. Dangling it off require wouldn’t work in ESM.
I’m not sure how it would interact with policies, but i don’t think there’s any security issues from retrieving a path - if filesystem access is restricted, then you’d be unable to use it, which policies could dictate?
@ljharb I am unclear / don't see an issue at a glance, but the problem can be explained by example:
// redirected to file:///app/node_modules/apm/interceptions/sql.js instead of file:///app/node_modules/sql/index.js
import.meta.resolve('sql');
// likely want file:///app/node_modules/sql/ (trailing slash, yes) ? but that now doesn't match where it is actually getting loaded from
module.package('sql');
This would affect pretty much any custom resolver the more I think about it, but we tend to state for policies that it is the job of the policy creator to manually enforce things like fs access per callsite (by redirecting as desired per scope) and similarly we can just state they need to customize module as well. This is a problem with a single argument form if it isn't an absolute location (path or URL) since it cannot be customized per scope. Similarly, without an absolute location, it wouldn't know what to resolve against, it could use process.cwd() but then things in node_modules can get the wrong values.
Makes perfect sense. Theoretically, this api in node could use the same resolver that would respect policies, and could return the directory that contains that package's package.json - then it wouldn't require extra work on behalf of the policy creator.
The double argument form, with source location, was always the plan; i just failed to mention the second argument in my above comment.
@ljharb if it respects policies, are you saying it should return file:///app/node_modules/apm/? since that is the package for the resolved "sql"? Policies are not realistically human writable (same as any non-trivial import map) so I wouldn't worry too much about work on creating since it can be left to tooling. Policies could always add data for this method to use rather than trying to make it work off the actual resolved location.
It should return whatever path foo would make fs.readFileSync(path.join(foo, 'package.json')) return the expected package.json contents (presuming fs access is available).
@ljharb I'm asking which package.json is expected
That of the sql package, in this case. If the apm wants to provide a fake package dir with a custom package.json it is certainly free to construct one.
@ljharb ah, file:///app/node_modules/sql/ in the example above then; by default it wouldn't resolve using the result of the current data in policies and a new data point would be needed to intercept that (same for loading hooks).
Are there any objections to the following API?
module.getPackageDir(specifier, baseDir)
precise naming aside, that's the only API we've ever really discussed or expected.
module.getPackageDir(specifier, baseDir)
My preference would be to call it getPackageRoot, but otherwise it's fine. In general most docs these days prefer “folder” to “directory,” and also “root” covers you in case this returns an URL (as I assume it would for ESM).
Is the base necessary? Can it be optional, and if omitted it's relative to the current file/module?
I'm fine with "getPackageRoot".
Because the API isn't context-aware (since it's on module and has to be imported, and isn't provided freshly for each module), it has no way of knowing the current file/module - so a base is necessary.
From CJS, you'd pass __dirname, and from ESM, import.meta.url.
Question: How would such a function know what the _real_ package directory is? Theoretically, _and_ practically, there could be a "mini" package.json that is there just for a type: module usage.
@giltayar node's algorithm knows that, because it only looks at the real package directory's package.json for "exports" - and that's the precise knowledge that this API will expose.
@ljharb I do not believe it recursively searches for a package containing "exports" I believe given our documentation it stops on the first package boundary not for a module located in node_modules/(@.../)?...
Maybe I misunderstood the last comment since the resolver wouldn't be recursing up to find the source of a deep import's package boundary with this API but instead resolving the deep import and returning the package.json which performed the last step of the resolve? Maybe we could name this API more clearly.
It’s very explicitly intended to recurse up and find the package boundary; it’s not particularly useful to know the package.json for a given file but it’s necessary to be able to find the package boundary itself.
A different name is fine; it’s the semantic that’s important.
@ljharb the nearest package boundary stops at the first package.json (recursing parent directories) but "exports" check is only during node_modules traversal (only recurses to find the module by a name). I'm now confused on the desired semantics.
The problem is that when a package has “exports” (ie, only when it’s in node_modules) there’s no reliable way to get to the package.json file that has “exports” in it. This api is intended only to solve that problem.
it would be fine, for example, if it rejected when the specifier passed to it wasn’t a package name that’s in node_modules (even on a deep file path) since the only important goal is to get the “exports” field’s package.json’s directory - the package root.
that seems fine, my complaint is we don't define a "package root" currently. as @giltayar points out you can have multiple package boundaries:
/app
/app/node_modules/logger/package.json - package boundary at /app/node_modules/logger
/app/node_modules/logger/esm/package.json - package boundary at /app/node_modules/logger/esm , in theory can contain "exports"
/app/node_modules/logger/cjs/package.json - package boundary at /app/node_modules/logger/cjs , in theory can contain "exports"
if /app/main.js wishes to know which package boundary performed the "exports" resolution of logger/logger.js we get into the confusion.
It sounds like you are trying to define "package root" to be unrelated to the actual filesystem from a traversal after resolution point of view. You want to get /app/node_modules/logger/package.json. This is a short circuiting instead of post resolution recursion. I am not really sure "package root" is a clear term so I think the "exporting package" might be better. For example if we add a redirect:
/app/node_modules/cjs-logger/ -> /app/node_modules/logger/cjs/
Would we expect using the API on cjs-logger/logger.js to return /app/node_modules/logger/cjs/package.json? It seems so from what I'm reading in the comment above. Stating the "right" one isn't really clear and it seems "the one with exports" doesn't make sense if multiple may contain exports. The one which had its "exports" used however is unique and non-reentrant.
@ljharb what if there is no exports field? It's still a standard package, no? (Node.js will look for index.js)
What I was trying to point out that there is no _real_ and _offical_ way to figure out the package root, only a set of heuristics that will work.
Might I suggest an alternative API that is (IMHO) much better _and_ guaranteed to be correct?
import module from 'module'
module.findPackageRoot(resolvingPackageFilePath, specifier)
So, given a file path and a _specifier_, it will find the package root. This is guaranteed to work because Node.js uses the same algorithm to find the entry point file (when doing import.meta.resolve(...), but in this case, it's not returning the entry point file of the specifier, but the package.json that was used to figure out that entry point file.
So, as an example, module.findPackageRoot(import.meta.url, 'foobar/abc') is equivalent to finding the root directory of the package foobar, while searching from the current file. It is equivalent to doing module.createRequireFromPath(resolvingPackageFilePath).resolve(specifier) but not returning the resolution of the specifier, but rather the directory where the package.json of that package was found.
The package root is already defined as “the package.json file exports would be respected in, if the key is present” - that is real and official and long since shipped. This API is designed simply to expose that existing definition.
@ljharb but what if the key is _not_ present? Which package.json would you choose? The first one you find when you go up the directory tree? The last one?
@ljharb "package root" is not defined as that to my knowledge in the docs anywhere as how a specifier is imported affects if "exports" is used per the example above with nested package.json files.
@ljharb Wait, I may be an idiot. If the API is as proposed by @DerekNonGeneric (module.getPackageDir(specifier, baseDir)), then that's equivalent to my suggestion. I think.
@giltayar i believe the desired package.json directory would be the one within node_modules that was traversed into (not up to using '..') for resolution purposes.
@giltayar the same file that _if_ it were present, it’d be respected in. It’s actual presence shouldn’t be relevant.
I’m fine with the api taking a full specifier and not just a package name, of course, that seems strictly more useful.
@bmeck whatever it’s called, there is a concept in node itself which I’m calling “package root”, which is “the folder that could contain a package.json that could contain an “exports” field that _would be respected_, if all of those things are present”.
@ljharb my point is that this concept isn't static and requires a resolution referrer and specifier to know if they would be used. So a "package root" isn't something that is trivially statically resolved without such a resolution. There is no real "package root" in your definition with a corresponding resolve operation. And situations like cjs-logger above can show this a bit more concretely.
@bmeck i totally agree, which is why the “base dir” would be a necessary part of the api - ie, a “resolve from” location. Is there something I’m missing that wouldn’t solve?
Not mechanically, just the term "package root" seems incorrect though we only mention it in the docs once in this sort of context and a bit vaguely.
I would be very surprised if a user asked “where does the exports field work”, and they were told “in a package.json in any package root”, and they did not immediately understand what that meant - so i can’t think of any better term.
If anyone has a better one, that’s fine too - the only thing that’s really critical here is the semantics.
@ljharb for things without redirects and without nesting I don't think it is so problematic. It is nesting and redirects which make the package root term really unclear. I think my concern is the framing of "package root" seems to imply that there isn't that relation with resolution and you can't actually determine it. I did state "exporting package" above mostly because it implies an operation is being done to perform exports. No strong preference, just something that implies the operation is going on seems sufficient. ${action}Package(Root|Boundary)? seems fine though I'd be somewhat unclear on difference between boundary and root still. Root is not clearly defined anywhere.
Here is what I gather at the moment…
A “package _boundary_” is what is at the outer edge of a “package _scope_”. A “package _scope_” is defined by a package.json, but it may not necessarily have an "exports" field (it may, however, contain a "type" field). The correct term is actually “package _root_”, which is defined by a package.json that contains the entrypoint(s). I was able to find the term being used in the NPM v7 docs (but nowhere was this mentioned by any of our core docs). Therefore, it seems like the following is what we are after.
module.getPackageRoot(specifier, baseDir)
It doesn't need a clear definition if it's something most people already understand - either way, if we think that's the right term for this API, then this API _will define it_, and that will constitute the clear definition.
Here is what I gather at the moment…
Agree with all of this. One last thought is that we could consider resolve instead of get, i.e. resolvePackageRoot, to parallel require.resolve. I’m not sure I prefer one over the other, as get is pretty straightforward and maybe we don’t _want_ to reference require.resolve; but I thought it would be worth bringing up.
We should also probably rename baseDir to just base, as it would be an URL in the case of ESM (e.g. getPackageRoot('lodash', import.meta.url)).
I like both of those naming suggestions.
It doesn't need a clear definition if it's something most people already understand
I think they have an intuition, but if I were to ask someone what the package root of /app/node_modules/logger/cjs/logger.js in the example above I don't think they would state "it depends", which it does. I think the intuition thinks it can be resolved w/o a base from which resolution starts.
it does depend - on the base. If the base is inside app and not inside app/node_modules, then the package root is unambiguously /app/node_modules/logger.
Well in the case of cjs-logger above, the base is /app/main.js but the package which it resolves using is /app/node_modules/logger/cjs/package.json not just logger/package.json. This is why it isn't really clear, we can define it as whatever we want, I'm keen on defining it, but we just have to be very clear that what we define it as likely won't be simple to intuit about.
Yes but the package root is "where 'exports' would work", which is only the dir that contains logger/package.json. One can navigate from there if finding logger/cjs/package.json is needed.
I agree clearly documenting what this is doing is important, but I don't think "what it's doing" is actually ambiguous.
@ljharb in my example above the cjs-logger/package.json#export works. I'm stating that there is clearly not a single package.json that this makes sense for.
I'm confused. "exports" doesn't work in a nested package.json. so in /app/node_modules/logger/cjs/logger.js, /app/node_modules/logger/cjs/package.json can't have an "exports" field that makes a difference. The only impact that package.json file can have is on "main" or on "type", as far as I'm aware.
See my example above, the resolution through cjs-logger uses a different package.json
I'm still not clear on what you mean. is app/node_modules/logger/cjs a symlink or something?
per my comment above app/node_modules/cjs-logger/ points to app/node_modules/logger/cjs/; that means that the "exports" from app/node_modules/cjs-logger/package.json is used, which is actually app/node_modules/logger/cjs/package.json.
ah, gotcha. In that edge case which i don't expect anyone to ever have an intuition about, yes, app/node_modules/logger/cjs/ would be the package root for the cjs-logger package specifier.
I'm not stating they should have an intuition, just that the intuition of having a single one w/o the resolve operation is likely misleading in various cases like that or using absolute paths (doesn't go through exports, want boundary or null?), relative paths (same), packages without "exports" (same), etc. We need to clearly define the meaning of w/e we want to call this so that we can set this up properly and not try to define in retrospect.
Agreed - sounds like we're in agreement that the API is fine, but must be very clearly documented, and that whatever term is implied by the name we pick is also clearly defined.
@bmeck, I created a repo similar to the app one you describe above to see if we could turn it into a test fixture of sorts. I was having a bit of trouble understanding exactly how to make it all fit together logically, but would you mind taking a look to see how close it is to being able to test your case? If you could PR the repo with any corrections that would be great too.
I just published a pretty naive solution for this implemented in user-land here: https://www.npmjs.com/package/package-resolver
@MylesBorins suggested that we could use this as an implementation for this issue.
Doing this implies that we don't respect the redirects set by policies. Do you all think this could be a good idea?
It’s a good basis to start, but i think that to land it needs to precisely match what require does.
to land it needs to precisely match what require does
That would be my guess as well. @MylesBorins what do you think?
It seems like it won't exactly match require if it is doing the ignoring of policies in the comments above.
@MylesBorins what is your opinion here, is it okay to ship an API that does not respect policies?
I would not personally block on policies support but if @bmeck or others would then it would be a requirement.
@bmeck does require.resolve respect policies?
@bmeck does require.resolve respect policies?
Currently, no. Either way is fine, but I'd prefer we not make it align with require() in that sense. Interestingly import.meta.resolve does match import though so this likely should be made to align either way. Slight preference on making both resolve and loading the same as it makes the implementation simpler.
I think module features should always align with require and import, and when require/import aligns with policies, this should too. If those are inconsistent with alignment to policies, that seems like a bug.
@ljharb I can make them align. If we do though, the situation described above doesn't seem to be what you want?
It seems like what we should do is kick off a PR to core and work these out there. I think if we mark this as an experimental API we could always land something with a warning and gather feedback
@bmeck basically the entire purpose of this API is to provide a reliable replacement for require.resolve(path.join(packageName, 'package.json')) even in the presence of “exports”; if policies hijack a module and point it somewhere else, I’d assume either all of, or none of, requires, imports, and this API would follow suit.
@ljharb ok, can do
I would add... if we add this API as experimental we could align everything (import / require / resolve / this api) as a single push... I don't think we should block an initial implementation on a larger platform inconsistency
Most helpful comment
It seems like it won't exactly match
requireif it is doing the ignoring of policies in the comments above.