*.ts files should be reported as TypeScript, not JavaScript.
Is it possible to support TypeScript/.ts files with #! /usr/bin/env node shebangs? These files should be reported as TypeScript, not JavaScript, right?
Please confirm you have...
I have a project that is being identified as Javascript when the project mainly consists of a TypeScript file:
env line: https://github.com/google/clasp/blob/master/index.ts#L1I don't want to add a .gitattributes to the repo or remove the shebang line.
https://github.com/google/clasp
2018-04-23
TypeScript
JavaScript
CC: @arfon @pchaigno @larsbrinkhoff @Alhadis
Previous Issue: https://github.com/github/linguist/issues/3067
This shouldn't be too hard to fix. Thanks to https://github.com/github/linguist/pull/4099, the shebang strategy should pass on the results to the subsequent strategies, and the extension strategy just so happens to be the next one in the list: https://github.com/github/linguist/blob/14a7cb2d1b3d6f822701bccf36666303509c7621/lib/linguist.rb#L60-L66
Without testing this, I suspect adding:
interpreters:
- node
... to the TypeScript section at https://github.com/github/linguist/blob/14a7cb2d1b3d6f822701bccf36666303509c7621/lib/linguist/languages.yml#L4770-L4782
... should do the trick.
@lildude Yep, that was my thought too. We may need to extend the heuristic strategy to handle files without an extension (but with a node interpreter) though. Otherwise, it'll fall back to the Classifier, and I'm afraid it won't do a very good job at distinguishing JavaScript from TypeScript.
@grant Do you have an example of a TypeScript with a node shebang and no file extension? I'd like to add a test for that case, but I'm having a hard time finding such a sample.
My case was a ts file with a shebang that was interpreted as js.
I don't have an example of ts w/o the shebang and no file extension. Like this?
/test
#! /usr/bin/env node
console.log('hi');
I ended up adding a .ts TypeScript as a fixture file after removing its extension.
@grant One other thing: any ideas of keywords/constructs we could use to distinguish TypeScript files from JavaScript files? (E.g., keywords/constructs that are invalid in TypeScript/JavaScript.)
E.g., keywords/constructs that are invalid in TypeScript/JavaScript.
This is the part which concerns me. Even if it's invalid JavaScript today, it might not be tomorrow. Several constructs have been added to the ECMAScript specification over the last 3 years, many of which would have been considered invalid syntax in 2015 and earlier.
The language moves fast. I really don't think the risk of clashes with future JS revisions justifies the ability to classify TypeScript executables which lack a modeline or file extension. I reiterate, again, that this is a problem which is human in nature:
I don't want to add a .gitattributes to the repo or remove the shebang line.
I have suggested a modeline be used as an alternative to using a .gitattributes file or removing the interpreter directive. I can't see any potential problems that would be caused by adding -*- TypeScript -*- to the second line, or something similar.
I have suggested a modeline be used as an alternative to using a
.gitattributesfile or removing the interpreter directive. I can't see any potential problems that would be caused by adding-*- TypeScript -*-to the second line, or something similar.
I can understand that someone may not want to change its committed files to accommodate a GUI they're using (i.e., GitHub). In addition, it may not always be possible; try asking the Linux maintainers to add a .gitattributes files to the root directory because "it doesn't display the right language on my GitHub fork".
In any case, as I said in #3067, Linguist detection is best effort; we'll never reach 100% accuracy. I'm not trying to change that. If Linguist fails to classify a file and the user has to use overrides, so be it.
Now, regarding the heuristics, if TypeScript and JavaScript are that difficult to distinguish, we could simply default to JavaScript if the file doesn't have an extension. If it does, the Extension strategy can handle it. Given that I'm having a hard time finding a single extension-less TypeScript file with a shebang, I'd be inclined to think that it's a viable approach. What do you think?
Most helpful comment
I can understand that someone may not want to change its committed files to accommodate a GUI they're using (i.e., GitHub). In addition, it may not always be possible; try asking the Linux maintainers to add a
.gitattributesfiles to the root directory because "it doesn't display the right language on my GitHub fork".In any case, as I said in #3067, Linguist detection is best effort; we'll never reach 100% accuracy. I'm not trying to change that. If Linguist fails to classify a file and the user has to use overrides, so be it.
Now, regarding the heuristics, if TypeScript and JavaScript are that difficult to distinguish, we could simply default to JavaScript if the file doesn't have an extension. If it does, the Extension strategy can handle it. Given that I'm having a hard time finding a single extension-less TypeScript file with a shebang, I'd be inclined to think that it's a viable approach. What do you think?