I'm trying to diagnose some extreme build differences between tsc and ts-loader in my monorepo. I'm using Process Monitor on windows to track files read. This image shows a typical repeated read of a file - in sequence - that appears for webpack 4 & ts-loader 8 but not for a tsc --build. Whereas tsc reads 8000 times, ~ts-loader is reading 80,000 times and then spending ungodly resources processing it all for 2+ hours. I was expecting to see ts-loader splunking into other projects because of my improper configuration, but that's not what I found. Is this a known behavior? I'm going to keep digging too, but I figured I'd share this in case it's known already. Thanks!
As I spot check the whole list, I see often (but not always) _double_ sequential listings of .d.ts files, js files, .npmrc. .npmrc is consulted many times throughout - perhaps other files are also read repeatedly. package.json files stand out as the ones read more than twice in a row. Perhaps this behavior and the 80,000 files are due to webpack bundling activities - but why isn't there such an extreme problem when transpileOnly: true?

for the .ts files that are read twice, maybe it's read once for exists and twice for content?

package.json files are the major offenders. 6,500 reads of \node_modules\.pnpm\[email protected]\node_modules\react\package.json for example, with a long tail down to 1x reads including package.json files. So it's 70k reads among only 2,500 non-ts/tsx file paths. With transpileOnly: true the maximum is like 350 repeat reads. vs tsc --build which has a maximum of 19 repeat reads (and does typechecking, of course)
Note: the default value for experimentalFileCaching is true, but I'm pretty sure it defaults to false?
Note 2: experimentalFileCaching: true doesn't seem to avoid repeated reads of package.json files and it doesn't seem to avoid the double reads of other files - possibly related to https://github.com/TypeStrong/ts-loader/issues/825#issuecomment-416592240 ?
I would be happy to make a PR, but let's start with diff screenshots:
This change seems to be fairly safe (as long as the return values are immutable), saving as many as _hundreds_ of reads of a single module in my project's build. It avoids repeated resolutions of the same package.json especially from .d.ts stacks within dependencies:

But there's also the issue of repeated resolutions of imports across files. Here's an attempt at reducing such reads by caching. I know it is flawed _at least_ because external module imports can need to resolve differently (versions) per package, and this flattens all package imports to the first version the compilation encounters.

If there were a way to know from memory "what is our package root of this current module being evaluated?" then it seems to me that it would be safe to cache the resolution results.
But I'm not sure yet if these changes really make a difference - they might be totally misguided, but I'm trying.
Thoughts?
Feel free to experiment and see what you find! It's worth bearing in mind that webpack is a factor here and there may be limits to what optimisations can be achieved. But I encourage the investigation!
Thanks! So these two areas have or have not been evaluated recently for optimization potential? Or am I on a naive path?
Can you please elaborate on the webpack factor in limiting optimization potential? Is there a relevant fundamental difference between how tsc vs webpack & tsloader operate?
So these two areas have or have not been evaluated recently for optimization potential
Don't think so - crack on!
Is there a relevant fundamental difference between how tsc vs webpack & tsloader operate?
Yes - webpack triggers calls to the TypeScript compiler. So dependency walking is a concern of webpack, and file by file it calls ts-loader. tsc is operating directly on the files
You may also find this interesting: https://github.com/TypeStrong/ts-loader/pull/1251#issuecomment-773965925
That is helpful, thanks.
So because the imports could possibly go like .js -> import .ts -> import .js -> import .ts ..., we need webpack and its registered extension loaders to walk them.
Say that we could restrict our projects to those which tsc can build just fine - except for non-typescript-re-entrant assets like .less or .svg - and allowed ts-loader to walk the dependencies itself, along with files it didn't know how to load. With ts transpilation finished, unblock webpack to start its full traversal. This loader delivers the pre-transpiled output and dependency list for each path requested of this loader. If webpack requests a path from this loader that wasn't transpiled in advance, then it's an error.
Would that basic idea function? Is it already how a loader or a mode of ts-loader works? And, finally, if this loader could be in charge of its own dependency walk, why could that be fundamentally faster than performing on-demand for webpack?
Thanks for your insights.
That's a lot of questions!
I'm not sure it could work as a primary mode of webpack is watch mode, I'm not sure how that would work with what you're describing. Also I'm not sure the APIs of webpack would necessarily support the general idea.
I could be wrong though; I encourage you to experiment. It's possible this is a really great idea and that's always worth pursuing!
@johnnyreilly I created a self-contained repro for demonstrating the repeated reads behavior here: https://github.com/JasonKleban/practices-2021
@johnnyreilly I made an observation in https://github.com/microsoft/TypeScript/issues/42670#issuecomment-781514114 about a ~7x increase in traceResolutions output. Does ts-loader have any say in whether an import would be considered a "module" vs a "type reference directive"?
If I understand the question correctly, I don't think so. Everything that goes from webpack to ts-loader will be a module. ts-loader will not load type reference directives directly; the TypeScript compiler would do that and ts-loader consumes the results