Please confirm you have...
Perl 5 and Perl 6 are different languages (despite similar names; like C vs C++). However, the heuristic fails in some cases. This was previously opened in https://github.com/github/linguist/issues/3906 and the cases covered in that issue were resolved.
https://github.com/zoffixznet/Music-ScaleNote
As far as I can see, the heuristic is failing because the heuristic looks for use strict, but the module uses use strictures instead.
No idea
Perl 5 (aka "Perl")
Perl 6
Since the previous times we had ambiguity issues, the Perl 6 language has now decided more strongly that its extensions are .pm6 for modules, .p6 for files and .t for scripts.
Thus, I'd argue that the heuristic for .pm and .pl files should prefer Perl 5 much more strongly. Likely just checking for Perl 6 first, with use\s+v6\b only, not the current regex, and marking code as Perl 6 if detected, otherwise, going to Perl 5. (personally, I'd say it shouldn't be disambiguated with Perl 6 at all and just assume those extensions are always Perl 5 or other lang).
That leaves .t disambiguation. I'm less certain of this one, but just thinking that it can be improved if Perl 6 is tested first and checked against use\s+v6\b, along with some testing subs that have a dash in their names done-testing|is-deeply|isa-ok|throws-like. If that doesn't match, then assume the code's Perl 5 or other lang.
If someone submits a PR, is there a way to somehow preview that the changes give right results? Is installing Linguist locally sufficient?
As far as I can see, the heuristic is failing because the heuristic looks for
use strict, but the module usesuse stricturesinstead.
Could we add use strictures to the Perl 5 heuristic rule then?
We should also be able to use the shebang in the .t file. I'll look into it.
If someone submits a PR, is there a way to somehow preview that the changes give right results? Is installing Linguist locally sufficient?
Yes, it's sufficient to test detection.
Could we add use strictures to the Perl 5 heuristic rule then?
This is a losing battle - here's the most comprehensive list I know of for modules that import strict and warnings for you, and this isn't including opinionated ones that only use part of strict: https://metacpan.org/source/PETDANCE/Perl-Critic-1.130/lib/Perl/Critic/Utils/Constants.pm#L84-126
We should also be able to use the shebang in the .t file. I'll look into it.
There is no requirement for perl test files to contain a shebang, since they are normally executed via a makefile or prove. Note #!perl as in that file isn't even a valid shebang, it's just a convention.
This is a losing battle - here's the most comprehensive list I know of for modules that import strict and warnings for you, and this isn't including opinionated ones that only use part of strict: https://metacpan.org/source/PETDANCE/Perl-Critic-1.130/lib/Perl/Critic/Utils/Constants.pm#L84-126
Heuristic rules were never meant to catch 100% cases (I guess it's in the name...). I'm only asking here if adding use strictures would help us catch a greater number of Perl 5 files, or is it only used in rare cases? (i.e., is it worth it?)
There is no requirement for perl test files to contain a shebang, since they are normally executed via a makefile or prove.
The Shebang strategy is also a heuristic, so same as above.
Note #!perl as in that file isn't even a valid shebang, it's just a convention.
Sure, but it's a convention we should recognize.
I'm only asking here if adding use strictures would help us catch a greater number of Perl 5 files
It would, but for .pm/.pl files, it's overwhelmingly likely for them to be Perl 5 and not Perl 6 code, which is why I suggested trying to look for Perl 6 code first, and if Perl 6 things aren't present to assume Perl 5. This way, even if the Perl 5 code using one of those dozens modules that provide strict, it'd still be detected correctly. And if Perl 6 code is mis-detected as Perl 5, the author always has the option to simply switch to the more common .pm6/.p6 extensions.
As far as I can see (with GitHub.com's search engine), we still have quite a few .pl and .pm Perl 6 files. Of course, these search use the use v6 statement, so we can use our current heuristic rule to detect those files. But the point is that many still use .pl and .pm, and I doubt all of them have use v6 at the top. In any case, we try to avoid these kinds of fallback else in heuristic rules.
For now, I think we can address this by both improving the heuristic rule for Perl 5 and fixing the issue with the shebang. If you have other cases of misclassifications I can look into them, but it's unlikely will be able to detect them all accurately (that's what Linguist overrides are for). If/When the number of .pl and .pm Perl 6 files gets low enough, we'll change the heuristic rule or even remove those extensions for Perl 6.
I would still ask why it should mark a file as Perl 6 when it can't disambiguate between them. Perl 5 usage is far more widespread and as noted will be the only use of these extensions going forward, so it's a more sensible default.
I would still ask why it should mark a file as Perl 6 when it can't disambiguate between them.
Because the Heuristic strategy is not the last strategy. It's actually the naive Bayesian classifier that runs last and misclassifies your files here.
Most helpful comment
4099 was just merged, so this issue should be fixed with the next release of Linguist. There's usually a new release every couple weeks.