Linguist: TypeScript file (.ts) misidentified as JavaScript

Created on 23 Apr 2018  路  7Comments  路  Source: github/linguist

*.ts files should be reported as TypeScript, not JavaScript.

Is it possible to support TypeScript/.ts files with #! /usr/bin/env node shebangs? These files should be reported as TypeScript, not JavaScript, right?

Preliminary Steps

Please confirm you have...

Problem Description

I have a project that is being identified as Javascript when the project mainly consists of a TypeScript file:

I don't want to add a .gitattributes to the repo or remove the shebang line.

URL of the affected repository:

https://github.com/google/clasp

Last modified on:

2018-04-23

Expected language:

TypeScript

Detected language:

JavaScript


CC: @arfon @pchaigno @larsbrinkhoff @Alhadis
Previous Issue: https://github.com/github/linguist/issues/3067

Most helpful comment

I have suggested a modeline be used as an alternative to using a .gitattributes file or removing the interpreter directive. I can't see any potential problems that would be caused by adding -*- TypeScript -*- to the second line, or something similar.

I can understand that someone may not want to change its committed files to accommodate a GUI they're using (i.e., GitHub). In addition, it may not always be possible; try asking the Linux maintainers to add a .gitattributes files to the root directory because "it doesn't display the right language on my GitHub fork".
In any case, as I said in #3067, Linguist detection is best effort; we'll never reach 100% accuracy. I'm not trying to change that. If Linguist fails to classify a file and the user has to use overrides, so be it.

Now, regarding the heuristics, if TypeScript and JavaScript are that difficult to distinguish, we could simply default to JavaScript if the file doesn't have an extension. If it does, the Extension strategy can handle it. Given that I'm having a hard time finding a single extension-less TypeScript file with a shebang, I'd be inclined to think that it's a viable approach. What do you think?

All 7 comments

This shouldn't be too hard to fix. Thanks to https://github.com/github/linguist/pull/4099, the shebang strategy should pass on the results to the subsequent strategies, and the extension strategy just so happens to be the next one in the list: https://github.com/github/linguist/blob/14a7cb2d1b3d6f822701bccf36666303509c7621/lib/linguist.rb#L60-L66

Without testing this, I suspect adding:

interpreters:
- node

... to the TypeScript section at https://github.com/github/linguist/blob/14a7cb2d1b3d6f822701bccf36666303509c7621/lib/linguist/languages.yml#L4770-L4782

... should do the trick.

@lildude Yep, that was my thought too. We may need to extend the heuristic strategy to handle files without an extension (but with a node interpreter) though. Otherwise, it'll fall back to the Classifier, and I'm afraid it won't do a very good job at distinguishing JavaScript from TypeScript.

@grant Do you have an example of a TypeScript with a node shebang and no file extension? I'd like to add a test for that case, but I'm having a hard time finding such a sample.

My case was a ts file with a shebang that was interpreted as js.

I don't have an example of ts w/o the shebang and no file extension. Like this?

/test

#! /usr/bin/env node
console.log('hi');

I ended up adding a .ts TypeScript as a fixture file after removing its extension.

@grant One other thing: any ideas of keywords/constructs we could use to distinguish TypeScript files from JavaScript files? (E.g., keywords/constructs that are invalid in TypeScript/JavaScript.)

E.g., keywords/constructs that are invalid in TypeScript/JavaScript.

This is the part which concerns me. Even if it's invalid JavaScript today, it might not be tomorrow. Several constructs have been added to the ECMAScript specification over the last 3 years, many of which would have been considered invalid syntax in 2015 and earlier.

The language moves fast. I really don't think the risk of clashes with future JS revisions justifies the ability to classify TypeScript executables which lack a modeline or file extension. I reiterate, again, that this is a problem which is human in nature:

I don't want to add a .gitattributes to the repo or remove the shebang line.

I have suggested a modeline be used as an alternative to using a .gitattributes file or removing the interpreter directive. I can't see any potential problems that would be caused by adding -*- TypeScript -*- to the second line, or something similar.

I have suggested a modeline be used as an alternative to using a .gitattributes file or removing the interpreter directive. I can't see any potential problems that would be caused by adding -*- TypeScript -*- to the second line, or something similar.

I can understand that someone may not want to change its committed files to accommodate a GUI they're using (i.e., GitHub). In addition, it may not always be possible; try asking the Linux maintainers to add a .gitattributes files to the root directory because "it doesn't display the right language on my GitHub fork".
In any case, as I said in #3067, Linguist detection is best effort; we'll never reach 100% accuracy. I'm not trying to change that. If Linguist fails to classify a file and the user has to use overrides, so be it.

Now, regarding the heuristics, if TypeScript and JavaScript are that difficult to distinguish, we could simply default to JavaScript if the file doesn't have an extension. If it does, the Extension strategy can handle it. Given that I'm having a hard time finding a single extension-less TypeScript file with a shebang, I'd be inclined to think that it's a viable approach. What do you think?

Was this page helpful?
0 / 5 - 0 ratings