Linguist: TypeScript file misidentified as JavaScript

Created on 20 Jun 2016  路  11Comments  路  Source: github/linguist

Hey there,

Just noticed one of my repos is reporting JavaScript in it, thought this was a bit curious as may have left something in by mistake but seems like an issue.

The repo in question is https://github.com/jamesrichford/alsatian

and this search https://github.com/jamesrichford/alsatian/search?l=javascript

results in this file being returned https://github.com/jamesrichford/alsatian/blob/a981398cb2b624e282a70b0d62cbf5c09c82fb30/cli/alsatian-cli.ts

looks like it may be to do with the line #! /usr/bin/env node as this is the only thing different about this file.

Let me know if you need any more information.

Thanks!!! (PS ace work everyone involved, github has so many neat features :) Love it!)

Most helpful comment

@jamesrichford - yes it's this line that's causing the issue here.

One of the strategies we use when trying to detect the language of a file is to see if we can detect a shebang: https://github.com/github/linguist/blob/master/lib/linguist.rb#L62

There are a couple of possible fixes here:

  1. Set a modeline in your file (the modeline strategy runs before the shebang one) https://github.com/github/linguist#using-emacs-or-vim-modelines
  2. Set the language in a .gitattributes file https://github.com/github/linguist#using-gitattributes

@pchaigno @larsbrinkhoff it's this line that's causing the issue here. I'm wondering if we should remove this given that the popularity of TypeScript is only likely to increase.

All 11 comments

@jamesrichford This is resolved, right? I can see your repo showing 99% TypeScript. :smiley:

Hey @sahildua2305 - unfortunately not, it's the 0.9% that is the problem if you click on the 0.9% JavaScript you get a search result that returns a TypeScript file.

@jamesrichford - yes it's this line that's causing the issue here.

One of the strategies we use when trying to detect the language of a file is to see if we can detect a shebang: https://github.com/github/linguist/blob/master/lib/linguist.rb#L62

There are a couple of possible fixes here:

  1. Set a modeline in your file (the modeline strategy runs before the shebang one) https://github.com/github/linguist#using-emacs-or-vim-modelines
  2. Set the language in a .gitattributes file https://github.com/github/linguist#using-gitattributes

@pchaigno @larsbrinkhoff it's this line that's causing the issue here. I'm wondering if we should remove this given that the popularity of TypeScript is only likely to increase.

@arfon Couldn't we add the node interpreter to TypeScript? The current code architecture allows for any strategy to return multiple results, in which case subsequent strategies will select the appropriate one. I tested it here with https://github.com/jamesrichford/alsatian and it solves the issue.

@arfon Couldn't we add the node interpreter to TypeScript?

Yeah, that's probably the best thing to do here. I was thinking these had to be unique but that's the language aliases.

@arfon @pchaigno @larsbrinkhoff
Is it possible to support TypeScript/.ts files with #! /usr/bin/env node shebangs? These files should be reported as TypeScript, not JavaScript, right?

I have a project that is being identified as Javascript when the project mainly consists of a TypeScript file:

I don't want to add a .gitattributes to the repo or remove the shebang line.

I don't want to add a .gitattributes to the repo

This is the real problem.

@grant I had a branch on my fork to add the node interpreter to TypeScript for a while. It wasn't a very clean approach so I didn't send the pull request. Now that #4099 is merged, we might be able to revisit this. I'll try to find some time to work on this during the week.
We'll probably need to discuss heuristic rules (identifiable constructs and keywords) to distinguish JavaScript from TypeScript files. @grant Could you open a separate issue to track this? Do you have any suggestions?

Now, regarding Linguist overrides (.gitattributes et al.), there's some truth to @Alhadis' comment. For many files, identifying their programming language is a difficult task; it's unlikely that Linguist will be able to reach 100% accuracy anytime in the future. Overrides are a way to let the user decide when Linguist is unable to make the right decision, and you'll probably have to resort to them at some point.

@grant You should be able to override the classification using a modeline. As you'll see by Linguist's order of matching strategies, modelines are checked first before hashbangs.

This might be an ideal alternative if you still don't want to add a .gitattributes file.

@pchaigno Thanks. Overrides are definitely useful. Created a new issue: #4112.

If a file is named *.ts, even if it has a #! /usr/bin/env node, it should be a TypeScript file. Not sure how the heuristics work but the PR change should result in this case being satisfied.

@Alhadis So if I set // vim: syntax=TypeScript somewhere in my ts file, Linguist will find it and interpret the file as TypeScript? Even if that is the case, I don't want to add that line to every ts file.

Even if that is the case, I don't want to add that line to every ts file.

You won't need to add that line to every TypeScript file, simply the ones which use Hashbangs.

And if there are truly that many, enough that it becomes burdensome to add, then you either need to add a .gitattributes file, or tolerate the fact that the language is being misclassified as its own output.

Whether or not this gets addressed by a heuristic in Linguist is one thing, but knowing how to handle similar situations in future is another. We can't possibly cater for every potential mistake. :wink:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

d4nyll picture d4nyll  路  3Comments

BnSalahFahmi picture BnSalahFahmi  路  3Comments

haskellcamargo picture haskellcamargo  路  3Comments

Sanchez3 picture Sanchez3  路  4Comments

Haroenv picture Haroenv  路  4Comments