flow-remove-types produces invalid JS after non-latin characters

Created on 3 Jun 2019  Â·  13Comments  Â·  Source: facebook/flow

Flow version: 0.100.0

Expected behavior

Given the following input:

// @flow
// п
function foo(bar: number) {}

flow-remove-types should produce:

//      
// п
function foo(bar        ) {}

Actual behavior

It produces invalid JS:

//      
// п
function foo(bar:         {}

The non-latin character is the trigger. Without it, it works correctly. This seems like quite a serious bug considering how common it is to use non-latin characters in comments (e.g. comment in other languages, draw arrows etc.).

cc @mroch @samwgoldman

bug parsing

All 13 comments

This seems to be a bug in flow-parser which produces incorrect range values in AST for non-latin characters. E.g. parsing // п and // n produces [0, 5] and [0, 4] ranges respectively, although both should be the same.

same root cause as #5793, I believe. the lexer reports its position in unicode codepoints, of which these are 1, but they are multibyte.

@mroch I believe it's a recent regression, because https://astexplorer.net/ (which uses Flow v0.94.0) shows identical AST for both cases ([0, 4]).

Screen Shot 2019-06-03 at 9 27 32 AM

this is 0.94, when hovering over the FunctionDeclaration in the AST. so it's been there a while, but flow-remove-types regressed because it moved to flow-parser.

this appears to only be an issue with the range. the loc matches babylon.

it's a bit easier to see if you do it all on one line: https://flow.org/try/#0PQKk-CAsBmD2MNxA (see the AST tab). since it's all on line 1, the range should match the columns in loc.

@mroch ah, yeah, you're right. The link I commented earlier triggers the bug if you add a new line after the comment.

hrm i think we intentionally made range work like this. it's the byte offset in the file, which is what you'd want if you wanted to seek to read that part of the file from disk. but it's not what you'd pass to substring in JS, which is what tools like flow-remove-types do.

cc @nmote I think we should change range to match esprima and babylon, and potentially add (an option for) another field like byte_range or something. will need to figure out which tools need to be updated to use that.

Thanks for looking into this @mroch! Agreed about bringing Flow AST more in line with other parsers.

In the meantime, to fix the breaking bug with flow-remove-types sooner, should we just rewrite the logic so that it only depends on line/column, not range? I could take a stab at a PR.

alternatively, if it could work with UTF8 bytes instead of JS strings, that would make the existing ranges correct.

for example, perhaps changing flow-remove-types to use Buffers? I think the offsets of a Buffer are in bytes, not codepoints. modifying a buffer in place instead of lots of string concatenation could be a perf win too.
https://github.com/facebook/flow/blob/a05cbb202197d57e5e909e11dcbb859f11c48641/packages/flow-remove-types/index.js#L107

@mroch yep, will do!

We are running into the same problem, in our case with a â„¢ in a comment.

// @flow
import React from 'react';

/*
  This should Just Workâ„¢
*/
export const SomeComponent = (props: Object) => (
  <div {...props}>
    This breaks
  </div>
);

becomes

import React from 'react';

/*
  This should Just Workâ„¢
*/
export const SomeComponent = (props:=> (
  <div {...props}>
    This breaks
  </div>
);

Looks like it's been a few months now, and it shows up in all v2 releases of flow-remove-types -- any updates on a fix?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pelotom picture pelotom  Â·  3Comments

cubika picture cubika  Â·  3Comments

jamiebuilds picture jamiebuilds  Â·  3Comments

doberkofler picture doberkofler  Â·  3Comments

funtaps picture funtaps  Â·  3Comments