Stryker: Discussion: TypeScript compile errors with mutation switching

Created on 28 Aug 2020 · 17Comments · Source: stryker-mutator/stryker

See #1514 .

In a mutation switching world, we will create (typescript/flow) type errors. This is by design; it would be almost impossible to prevent it (we would need a team and community at least as large as https://github.com/Microsoft/TypeScript).

I don't think flow will be an issue because flow code is transpiled without type checking (using a babel plugin). However, TypeScript is a different beast all together..

There are a couple of approaches I've tried:

Attempt 1: Transpile only

Running the typescript transpiler with configuration similar to ts-node's --transpile-only.

However, after a little investigation, I learned that transpileOnly only works with --isolatedModules. This means that only a subset of typescript will be supported (although a best practice IMHO). We can't rely on this for all our users.

The same goes for skipping type checking by using babel; it won't support all use cases.

Attempt 2: Skip Typechecking

With a little fudging in the TypeScript compiler api you can get a long way. It would allow us to basically transpile all typescript projects while skipping type checks. See https://github.com/microsoft/TypeScript/issues/29651#issuecomment-616113744 for an example.

However, this approach falls short when you consider more typescript use cases. Calling the TypeScript compiler directly is just _one_ of the ways typescript compilation might take place. A non-extensive list of ways to compile typescript:

ts-node
angular-cli
vue-cli
react-scripts
karma-typescript
webpack with ts-loader
webpack with awseome-ts-loader
jest-ts
rollup with typescript plugin
Probably 100s more of tools/plugins

Solution: // @ts-nocheck

The solution I ended up with is to use the // @ts-nocheck directive on top of each file (js or ts, since js files can also result in type errors). It works from TypeScript 3.7 onward and will work with all use cases (as long as you're using a fairly recent TypeScript version).

_Note that errors can also be produced by test files, even though they're not mutated. Mutating production code can change the return type, for example function foo() { return 'bar'; } is mutated to function foo() { activeMutant === 1? {} : return 'bar'; }, which changes it's return type from string to string | undefined, which probably results in a type error somewhere in your test files_

Problem 1: Still type checking...

However, this presented me with another issue. Even with @ts-nocheck, a file can _still provide type errors_, this is because other directives can result in errors, for example @ts-expect-error, but @ts-check has the same result.

That's why I decided to remove all comments from JS and TS files. We're using the strip-comments package for that, which uses regular expressions to identify the comments.

You can configure this with 2 settings:

  "sandbox": {
    "stripComments": "**/*+(.js|.ts|.cjs|.mjs)?(x)",
    "fileHeaders": {
      "**/*+(.js|.ts|.cjs|.mjs)?(x)": "/* eslint-disable */\n// @ts-nocheck\n"
    }
  }

(see readme for specifics)

Problem 2: significant comments

I _assumed_ stripping comments wouldn't cause a problem, because comments should be comments. However, this is JavaScript we're talking about. This is a list of comments that are significant and we should actually keep if possible:

@jest-environment
@flow
... probably more?

Personally, I feel like removing all comments from all files is a shotgun approach. Simply removing only @ts-xxx comments would be better, but how would we do that? I don't want to be parsing all input files (seriously, it is already a pain to parse only the files to be mutated) and strip-comments doesn't support comment filtering.

Problem 3: `strip-comments` has known issues

We've recently found an issue with the strip-comments package where it leaves a file in an invalid state. If this would happen to someone than the only thing he/she can do is to disable the removal of comments altogether (with "sandbox.stripComments": false) or only for that file specifically.

Problem 4: incompatible scenario with jest.

Even though users can customize to fit their use case with stripComments and fileHeaders, we still don't support all scenarios. For example, running @stryker-mutator/jest-runner together with the jest-ts jest plugin might result in a situation where you don't want to strip comments (because of @jest-environment), but need to strip comments because you're using @ts-expect-error somewhere in your tests.

Any input would be very helpful 😇

@hugo-vrijswijk @brodybits @Lakitna

help wanted

Source

nicojs

👀1

All 17 comments

What if we would just consider it a killed mutation if it fails with a TypeScript compile error?

brodybits on 28 Aug 2020

__updated:__

I suspect that we could avoid the TypeScript compile errors by injecting @ts-ignore for each mutation. We may have to be smart to get @ts-ignore on the line right before the mutation.

Alternative from my previous comment is to consider a compile error the same as a killed mutation, but I am thinking it would be nicer if Stryker would help identify these untested mutations as well.

Adding this reference to help people like myself who are still not super experts with the TypeScript compiler: https://www.typescriptlang.org/docs/handbook/release-notes/typescript-2-6.html#suppress-errors-in-ts-files-using--ts-ignore-comments

brodybits on 28 Aug 2020

Thanks for the response 🤗

What if we would just consider it a killed mutation if it fails with a TypeScript compile error?

That's basically how it used to work (a Compile error to be precise). However, this cannot be done anymore with mutation switching, since all mutants are placed in the code at once. That would mark all mutants as a compile error 🤷‍♂️.

_Note, the @stryker-mutator/typescript-checker can be enabled to validate each mutant for compile errors, but that's after mutation switching takes place_

Or could we inject @ts-ignore when applying each mutation?

That would not solve "Problem 1", tests can still fail to compile.
EDIT: also type errors can still happen in the mutated file as well. @ts-ignore only ignores errors in the direct line below. If a mutant creates an error on a different line, it will not be ignored.

Seriously, @ts-nocheck is your friend. It works almost in all scenario's (except for the problems I've mensioned of course 😢)

nicojs on 28 Aug 2020

At this point, I'm contemplating implementing a simple parser that can distinguish between code and comments and use that to strip out _only_ the @ts-xxx comments.

I don't see any other way.

nicojs on 29 Aug 2020

Why not just use Babel, TypeScript, or @typescript-eslint/typescript-estree to parse the code and look for the comments to strip out?

brodybits on 30 Aug 2020

Ok, I've spent a couple of hours trying to write a custom parser to remove the @ts-xxx directives in comments, but I eventually gave up. There is no easy way of detecting code vs comments without actually parsing (most) nodes in the AST.

A couple of examples to illustrate why:

// @ts-check <-- this is a comment that needs to be removed
foo.bar();

foo.bar("foo \"/* @ts-expect-error */"); // <--No remove

const regex = / \/*@ts-check */; // <-- No remove, regex literal

const a = b / c; /*@ts-check */ // < -- yes remove, first `/` didn't start a regex literal

const b = <HelloJsx>
  // @ts-check {/* <-- no remove, jsx element */}
</HelloJsx>

I think we can still try this approach, but we should just rely on the typescript parser itself.

The idea I now have is that it would be disabled by default, and needed to be enabled using the sandbox.removeTSDirectives glob expression.

Parser code using the TypeScript compiler api

function removeTSDirectives2(file: File): string {
  const directives: TSDirective[] = [];

  const sourceFile = ts.createSourceFile(file.name, file.textContent, ts.ScriptTarget.Latest);

  const leadingCommentRanges = new Set<number>();
  const trailingCommentRanges = new Set<number>();

  function collectDirectives(node: ts.Node) {
    if (!leadingCommentRanges.has(node.pos)) {
      leadingCommentRanges.add(node.pos);
      ts.getLeadingCommentRanges(file.textContent, node.pos)?.forEach(({ pos, end }) => tryParseTSDirective(pos, end));
    }
    if (!trailingCommentRanges.has(node.end)) {
      trailingCommentRanges.add(node.end);
      ts.getTrailingCommentRanges(file.textContent, node.end)?.forEach(({ pos, end }) => tryParseTSDirective(pos, end));
    }
    ts.forEachChild(node, collectDirectives);
  }

  sourceFile.forEachChild(collectDirectives);

  let currentIndex = 0;
  let textWithoutTSDirectives = '';
  for (const directive of directives) {
    textWithoutTSDirectives += file.textContent.substring(currentIndex, directive.range[0]);
    currentIndex = directive.range[1];
  }
  textWithoutTSDirectives += file.textContent.substr(currentIndex);
  return textWithoutTSDirectives;

  function tryParseTSDirective(commentStartPos: number, commentEndPos: number): void {
    const commentDirectiveRegEx = /^(..\s*)@(ts-[a-z-]+).*$/;
    const commentText = file.textContent.substring(commentStartPos, commentEndPos);
    const match = commentDirectiveRegEx.exec(commentText);
    if (match) {
      const directiveStartPos = commentStartPos + match[1].length;
      directives.push({
        range: [directiveStartPos, directiveStartPos + match[2].length + 1],
      });
    }
    return undefined;
  }
}

Custom parser starter code (for reference)


enum CommentKind {
  SingleLine,
  MultiLine,
}

export function removeTSDirectives(text: string): string {
  const directives = findTSComments(text);
  let currentIndex = 0;
  let textWithoutTSDirectives = '';
  for (const directive of directives) {
    textWithoutTSDirectives += text.substring(currentIndex, directive.range[0]);
    currentIndex = directive.range[1];
  }
  textWithoutTSDirectives += text.substr(currentIndex);
  return textWithoutTSDirectives;
}

interface TSDirective {
  range: Range;
}

function findTSComments(text: string): TSDirective[] {
  let pos = 0;
  const directives: TSDirective[] = [];
  while (pos < text.length) {
    switch (currentCharCode()) {
      case CharacterCodes.slash:
        const nextChar = text.charCodeAt(pos + 1);
        if (nextChar === CharacterCodes.slash || nextChar === CharacterCodes.asterisk) {
          scanComment();
        } else {
          scanRegex();
        }
        break;
      case CharacterCodes.doubleQuote:
      case CharacterCodes.singleQuote:
        scanString();
        break;
      case CharacterCodes.backtick:
        scanTemplate();
        break;
      default:
        pos++;
    }
  }
  return directives;

  function currentCharCode() {
    return text.charCodeAt(pos);
  }

  function scanTemplate(): number {
    pos++;
    while (pos < text.length) {
      if (currentCharCode() === CharacterCodes.backtick) {
        pos++;
        break;
      }

      // '${'
      if (currentCharCode() === CharacterCodes.$ && pos + 1 < text.length && text.charCodeAt(pos + 1) === CharacterCodes.openBrace) {
        scanTemplateSpan();
        break;
      }

      pos++;
    }
    return pos;
  }

  function scanTemplateSpan(): void {
    pos += 2;
    while (pos < text.length) {
      const ch = currentCharCode();
      if (ch === CharacterCodes.closeBracket) {
        pos++;
        break;
      }
      if (ch === CharacterCodes.doubleQuote || ch === CharacterCodes.singleQuote) {
        scanString();
      } else if (ch === CharacterCodes.backtick) {
        scanTemplate();
      } else if (ch === CharacterCodes.slash) {
        scanRegex();
      } else {
        pos++;
      }
    }
  }

  function scanRegex(): void {
    pos++; // skip `/`
    let inCharacterClass = false;
    while (pos < text.length) {
      const ch = currentCharCode();
      if (isLineBreak(ch)) {
        // Premature ending, syntax error.
        pos++;
        break;
      } else if (ch === CharacterCodes.slash && !inCharacterClass) {
        // A slash within a character class is permissible,
        // but in general it signals the end of the regexp literal.
        pos++;
        break;
      } else if (ch === CharacterCodes.backslash) {
        // Escape next character
        pos++;
      } else if (ch === CharacterCodes.openBracket) {
        // Start a character class, for example `[a-Z]`
        inCharacterClass = true;
      } else if (ch === CharacterCodes.closeBracket) {
        // Close character class
        inCharacterClass = false;
      }
      pos++;
    }
  }

  function scanString(): void {
    const quote = currentCharCode();
    pos++;
    while (pos < text.length) {
      const ch = currentCharCode();
      if (ch === quote) {
        pos++;
        break;
      }
      if (ch === CharacterCodes.backslash) {
        pos = pos + 2;
      }
      if (isLineBreak(ch)) {
        pos++;
        break;
      }
      pos++;
    }
  }

  function scanComment(): void {
    const startPos = pos;
    const kind = text.charCodeAt(pos + 1) === CharacterCodes.slash ? CommentKind.SingleLine : CommentKind.MultiLine;
    pos += 2;
    switch (kind) {
      case CommentKind.MultiLine:
        while (pos < text.length) {
          if (currentCharCode() === CharacterCodes.asterisk && text.charCodeAt(pos + 1) === CharacterCodes.slash) {
            pos += 2;
            break;
          }
          pos++;
        }
        return;
      case CommentKind.SingleLine:
        while (pos < text.length) {
          if (isLineBreak(currentCharCode())) {
            break;
          }
          pos++;
        }
    }
    // Eureka, we've got a comment. Now find that @ts-xxx directive
    tryParseTSDirective(startPos, pos);
  }

  function tryParseTSDirective(commentStartPos: number, commentEndPos: number): void {
    const commentDirectiveRegEx = /^(..\s*)@(ts-[a-z-]+).*$/;
    const commentText = text.substring(commentStartPos, commentEndPos);
    const match = commentDirectiveRegEx.exec(commentText);
    if (match) {
      const directiveStartPos = commentStartPos + match[1].length;
      directives.push({
        range: [directiveStartPos, directiveStartPos + match[2].length + 1],
      });
    }
    return undefined;
  }
}

nicojs on 30 Aug 2020

Why not just use Babel, TypeScript, or @typescript-eslint/typescript-estree to parse the code and look for the comments to strip out?

Yeah, I guess there is no other way. Will implement something this week. It should replace the sandbox.stripComments option.

nicojs on 30 Aug 2020

I am thinking there should be a pretty simple way to remove the unwanted comments using Babel. Babel supports a comments: false option, and it should be very easy to use @babel/traverse to remove the comments (all comments) like this (not tested):

using @babel/traverse to remove all comments

import { traverse } from '@babel/traverse'
import types from '@babel/types'

traverse(ast, {
  enter(path) {
    types.removeComments(path.node)
  }
})

ref:

While this may not be as refined as using tsc to only strip comments matching the pattern, I would favor using Babel over using multiple parsers in the Stryker Mutator.

I can think of a couple possible approaches:

Babel approach 1: use Babel twice: first for preprocessing and second in instrumentor for mutating
Babel approach 2: do the parsing on each of the files, in a separate step in packages/core/src/process, save the parsed ASTs in the context, and use the parsed AST to both remove the comments and apply the mutators

I think Babel approach 2 would be more elegant but would have a couple of major caveats:

The AST type, which would be HtmlAst | JSAst | TSAst, would affect the Stryker context;
More memory would be needed to store the AST for each source file. It should be possible to drop the AST context for each source from memory upon computation of the mutations, but this may not be so trivial.

I hope to give this a try sometime, no promise when due to some other priorities.

brodybits on 30 Aug 2020

@brodybits great minds think alike 😎. I reached the same conclusion.

I'm opting for approach 1 for now. I'm adding the functionality to the instrumenter package. That way, we can keep any direct dependency on a specific parser out of Stryker core. It is called directly from the preprocessor.

I think the total performance impact might not be too bad because I'm thinking only to parse the file if a @ts-xxx is found somewhere in the code. Most files won't contain TS code, so those files are never parsed 🚅.

I also think of naming the option sandbox.disableTypeChecking. It will combine the removal of // @ts-xxx directives with the inserting of the // @ts-nocheck comment. I think this will allow for much better user experience.

An additional advantage is that this approach will also work for vue files that contain type checking (using the vue cli and some webpack magic) because the instrumenter can also parse HTML and vue files.

I think '**/*+(.js|.ts|.html|.vue)?(x)' is a valid default for most use cases.

I hope this solution is a good compromise. It is unfortunate, we have to implement it, but this will allow for the seamless use of Stryker in most projects.

nicojs on 30 Aug 2020

@nicojs I think that makes the most sense in general. More simple and probably less messy than what I had proposed.

I would also favor that the CLI adds the sandbox option when generating the config, and tells the user what is going on.

'*/+(.js|.ts|.html|.vue)?(x)'

How about this: **/*.{js,ts,html,vue}?(x)

It should have the same meaning, but simplified a little bit.

brodybits on 30 Aug 2020

Follow the work in progress here: #2446

I've tested it on stryker core and, even without the regex I proposed, I don't see a negative performance impact. And more importantly, it works like a charm! I'm really happy with this result. 😀

I think that makes the most sense in general

Glad you agree. 🙏 And thanks for the help figuring it out.

How about this: **/*.{js,ts,html,vue}?(x)

Sure. Maybe it's even better to use **/*.{js,jsx,ts,tsx,html,vue}, since htmlx and vuex don't make sense and listing them all might be even more clear.

nicojs on 30 Aug 2020

👍1

I think the total performance impact might not be too bad because I'm thinking only to parse the file if a @ts-xxx is found somewhere in the code. Most files won't contain TS code, so those files are never parsed 🚅.

I am not sure if I understand that comment. It looks to me like the changes in PR #2446 would happen in all files that match the pattern. Am I missing something here?

brodybits on 30 Aug 2020

Yeah, that's why it's a draft PR 🙈

Hang on. Will ask you to review when it is done.

nicojs on 31 Aug 2020

Just to get my head around the details:

Currently, you are first instrumenting for Stryker, and then instrumenting for Typescript. This is why you need to work with the @ts-xxx stuff.

The solution you have been discussing, while I was having a quiet weekend :), will: 1. Instrument for Stryker; 2. Preprocess for Typescript (@ts-xxx comment wrangling); 3. Instrument for Typescript

I was thinking about instrumenting the other way around. First, you instrument for Typescript and then for Stryker. This would remove the need for comment fiddling altogether. I guess you don't do this because of mutations inserted by the TS instrumentation? Source maps might be able to help you a bit here, but I'm not sure.

Lakitna on 31 Aug 2020

Almost correct @Lakitna :)

This is the process:

https://github.com/stryker-mutator/stryker/blob/9c846bf618af487815074e3b49d674cf1786cdcd/packages/core/src/process/2-MutantInstrumenterExecutor.ts#L30-L38

Instrument the code (works for JS, TS files as well as HTML/Vue files (script tags are instrumented)).
"Preprocess" the files. Disable type checking is one of the preprocessors

We might open up the "preprocess" API in the future, but for now, we're keeping it private. Public API's are a hassle to maintain.

To disable type checking, we will need to parse the file, but performance impact is kept to a minimum because it will only parse when "@ts-" can be found anywhere in the file 😎

nicojs on 31 Aug 2020

👍1

THIS ISSUE IS FINALLY CLOSED AND I'M VERY HAPPY WITH THAT!!!!

😎

nicojs on 2 Sep 2020

@nicojs right now: