Tslint: Run linting logic in spawned threads

Created on 11 Mar 2017 · 11Comments · Source: palantir/tslint

Followup to #389 & #1131. I'm taking a look at this for work because we have a few thousand files to lint. Running them in serial takes a _long_ time. Our options are:

Parallelize TSLint.MSBuild _(users of other build systems wouldn't get the benefits, so probably not)_
Have TSLint itself run across multiple threads / cores.

Within option 2 I see there being two possibilities:

Refactor the internals of Runner and Linter to be asynchronous
- I think this would be a nice but making Linter in particular would be a breaking API change
- Do you think you'd want to start to migrate to asynchronous file reading anyway?
Create a new ProcessSplitter class between tslint-cli.ts and Runner
- Don't use it unless --parallel is provided and > 1
- Have it spawn processes equal to --parallel
  - It would do its own file globbing to get the full list of files to lint
  - Each process would get 1/<--parallel> of those files
- After the processes are done it would collect and sort the results and report them as if they were synchronous

Thoughts?

Hard In Discussion Needs Proposal Enhancement

Source

JoshuaKGoldberg

👍11

Most helpful comment

Hi there,

I have one suggestion for this: what if we'll move all rules, which doesn't require a type checking into separate processes and split all files between them, but all rules, which require a type checking we'll leave in the main thread (to avoid compilation of the project multiple times)? 🤔

I just played with that and have some results.

Linted files count - 1894
Enabled rules requires type checking - 9
Enabled rules doesn't require type checking - 123
CPU count - 4

I've created 3 separate processes (os.cpus().length - 1) to linting by non-typed rules and leave linting by typed rules in the main thread. So, each separate process lints ~630 files by 123 rules, the main thread lints all 1894 files by 9 rules.

In my runs each process works ~70-85 seconds, the main thread lints all files ~70 seconds. Time to create ts.Program is 15-16 seconds. So, the total time of linting the whole project is ~100-105 seconds. tslint from npm does it in ~170 seconds (parallel version is ~40% faster).

To summarize the results:

Parallel linter takes ~100-105 seconds
Single-thread linter (from npm) takes ~170 seconds

So, the parallel version is ~40% faster in my case.

What do you think about that?

For me (I'm not familiar with tslint's code a lot) there is still some open questions:

We can't send/receive the code/nodejs state between processes, so we can't transport the whole LintResult object to the main thread. In my case I just use errorCount and output from that object - it seems that it's enough to print the error(s).
(derives from 1) We can't apply fixes in this case (or we need to transfer serialized state of the fix to the main process and then apply them).
(derives from 1) For now a sorting of the output is lost, but I'm not sure that this is a big problem.

/cc @JoshuaKGoldberg @ajafff

timocov on 12 Jan 2019

👍3

All 11 comments

Some random thoughts:

async processing:

does async I/O work with --type-check or do we get race conditions from that?
+1 to make I/O in Runner async
- could use glob-stream for globbing to start linting as fast as possible (although that's not the bottleneck)
- use a concurrent transform stream to read the file
- use a write stream to do the linting
- Runner should not throw away the ts.Program (created when --project is passed) when not type checking. We could then avoid the globbing and reuse the SourceFiles instead of reading the files again and creating new SourceFiles
there should also be an async version of Configuration and RuleLoader
making Linter async does not seem to be necessary?
- the only I/O made here is used to apply fixes
- unrelated: after applying fixes, use ts.updateSourceFile instead of this.getSourceFile

multiple threads:

each thread would need to set up typescript, require all rules, cache the paths to the rule, ...
- don't know if that would slow things down
- requires much more memory

result cache:

cache the linting result and only lint that file again if it changed in the meantime
eslint has an option for this, too

ajafff on 11 Mar 2017

Should probably do #2282 first.
Also, #2235 would address the problem of reading files multiple times.

andy-hanson on 11 Mar 2017

👍2

Glad the reaction has been favorable so far. :) I'll start work on some of the low hanging fruit.

Re the downsides of multiple threads: agreed, and this should definitely not be used for "normal" (most) projects. The scenario I'm approaching this for is overpowered 16-core monstrosities (dev machines, build lab machines, obscene Azure VMs, and the like) made to deal with huge code bases. They're fantastic at MSBuild/gcc/etc. but amusingly running TSLint tasks slows them down something terrible. The argument could (and should) be made that projects big enough to slow down TSLint should be split up into sub-projects, and those sub-projects' TSLint tasks should be parallelized... but that's not always possible.

JoshuaKGoldberg on 29 Mar 2017

maybe merge-able with https://github.com/palantir/tslint/issues/943

beenotung on 12 Oct 2018

parallel linting is not supported so far, my workaround is

find src -name '*.ts' | parallel node_modules/.bin/tslint -p . --fix

sometimes also use git status to skip some files.

beenotung on 12 Oct 2018

Hi there,

I just played with that and have some results.

Linted files count - 1894
Enabled rules requires type checking - 9
Enabled rules doesn't require type checking - 123
CPU count - 4

To summarize the results:

Parallel linter takes ~100-105 seconds
Single-thread linter (from npm) takes ~170 seconds

So, the parallel version is ~40% faster in my case.

What do you think about that?

For me (I'm not familiar with tslint's code a lot) there is still some open questions:

We can't send/receive the code/nodejs state between processes, so we can't transport the whole LintResult object to the main thread. In my case I just use errorCount and output from that object - it seems that it's enough to print the error(s).
(derives from 1) We can't apply fixes in this case (or we need to transfer serialized state of the fix to the main process and then apply them).
(derives from 1) For now a sorting of the output is lost, but I'm not sure that this is a big problem.

/cc @JoshuaKGoldberg @ajafff

timocov on 12 Jan 2019

👍3

This looks like a great start @timocov! Super exciting that you got this to work 👏

We can't send/receive the code/nodejs state between processes

Why not? Can you elaborate? https://nodejs.org/api/child_process.html#child_process_options_stdio implies there are some built-in options there.

Applying fixes is done by a formatter AFAICS. For both 2/fixes and 3/sorting, from my perspective it should be possible to run something like:

Set up linting options, etc. (same as usual)
Spawn off to different threads as you've specified, collecting all their output from the full JSON formatter
Combine output from the threads into one big output
Deal with that output as if it came from a single thread (same as usual)

Thoughts?

JoshuaKGoldberg on 19 Jan 2019

Why not? Can you elaborate? https://nodejs.org/api/child_process.html#child_process_options_stdio implies there are some built-in options there.

I meant that we can't just send the whole instance of some class from one process to another process without implementing (de)serialization code. Or we can? If so, I can't realize how nodejs does it 🙂
Of course, we can send serialized state/JSON between processes to share result (and this is what I did in my the first implementation).

If you wish I can prepare the code (actually you already can see rough solution here, but I need to make it more readable and remove duplications) of the first implementation and we can discuss on it.

timocov on 19 Jan 2019

we can send serialized state/JSON between processes to share result (and this is what I did in my the first implementation)

Awesome, that's exactly what we'd need. +1 to preparing the code!

JoshuaKGoldberg on 19 Jan 2019

@JoshuaKGoldberg @adidahiya I just created #4483 with a little bit prepared code of parallel running the linter.

As I said before - I'm not familiar with tslint's code base a lot and can made an epic fails in the PR - keep this in mind 🙂.

timocov on 27 Jan 2019

🚀2

☠️ TSLint's time has come! ☠️

TSLint is no longer accepting most feature requests per #4534. See typescript-eslint.io for the new, shiny way to lint your TypeScript code with ESLint. ✨

It was a pleasure open sourcing with you all!

JoshuaKGoldberg on 3 Nov 2019

❤1

Was this page helpful?

0 / 5 - 0 ratings