Linguist: Many Perl 6 repos no longer recognized

Created on 8 Aug 2017 · 18Comments · Source: github/linguist

Recently many of my Perl 6 repos have started showing Unknown as the language, although two of them continue to show Perl 6. I can't seem to figure out which recent change would have caused this, or what difference in my repos causes this discrepancy.

Unknown:
https://github.com/ugexe/Perl6-Grammar--HTTP

Perl 6:
https://github.com/ugexe/Perl6-Text--Table--Simple

Source

ugexe

Most helpful comment

To be honest, I feel whitespace should be ignored altogether within language IDs when used in a fenced code block. Users might be tempted to collapse multiword language names into a single token:

~~~World of Warcraft Addon data => ~~~worldofwarcraftaddondata

Both notations should be understood. I've yet to encounter a language name that requires a separator to disambiguate it from a completely different, similarly-named language. Heck, you could probably skip hyphens too:

~~~worldofwarcraftaddondata => ~~~world-of-warcraft-addon-data

Just my two cents.

Alhadis on 10 Aug 2017

👍3

All 18 comments

Ah, that would be a side-effect of renaming a language: see #3672. I recall the same thing happening with s/Groff/Roff/ in language breakdowns too. Presumably something to do with caching.

In either case, it's only temporary and should be remedied by forcing the cache to refresh (pushing changes or republishing the repository, etc).

Alhadis on 8 Aug 2017

Thanks, will pass this info along to others.

ugexe on 8 Aug 2017

@lildude Do you know if this is a known issue?

pchaigno on 8 Aug 2017

@pchaigno This is more of a known limitation than issue. We don't perform a whole-sale recalculation when changes are made to Linguist - it's just waaay too expensive to do.

@ugexe Looking at your "Unknown" repo from GitHub's perspective, I can see it's still recognised as "Perl6". Your "Perl 6" repo is recognised as "Perl 6" (with the space) so this is indeed going to be exactly as @Alhadis pointed out.

Pushing any change the the "Unknown" repo will kick off a job that will recalculate the language stats. Please keep in mind that the job is low priority so can take a while to run, especially during peak west-coast :us: business hours.

lildude on 9 Aug 2017

Hm, I see. I though language names were resolved at rendering time, hence my question :-)

pchaigno on 9 Aug 2017

They are, but only by comparing what's already been determined and stored in the db when the analysis was performed with what Linguist currently says - it's literally if lang = Linguist::Language[lang] where lang is one of the top languages already determined during analysis. In this case the db says "Perl6" so Linguist::Language['Perl6'] == nil hence "Other".

lildude on 9 Aug 2017

Would keeping a record of name-changes help? I believe Homebrew do something similar when they rename formulae, so searching for former names will still yield the correct formula.

Alhadis on 9 Aug 2017

Would keeping a record of name-changes help?

I'd imagine so.

lildude on 9 Aug 2017

So the languages aren't indexed by language id? I would have expected if lang = Linguist::Language[lang_id]...

pchaigno on 9 Aug 2017

So the languages aren't indexed by language id?

They are, but when things get to the view_model (via the cache) things are queried by name. I'm not sure of the justification for this as this precedes me - things haven't changed on this front for well over 5 years - and I've not taken the time to follow the code all the way through to try and work out the logic myself. I'm sure it could probably be improved though as Linguist as moved on a long way in five years.

As an aside, if lang = Linguist::Language[lang_id] wouldn't work as Linguist::Language[lang_id] currently expects a string.

lildude on 9 Aug 2017

Thanks for the explanation @lildude! ❤️

pchaigno on 9 Aug 2017

Does the same rename also affect fenced code blocks in Markdown files?

Previously I could define a perl6 fenced codeblock and it would highlight. This is no longer the case.

0racle on 10 Aug 2017

@0racle yup

lildude on 10 Aug 2017

We could probably add an alias for that case though.

pchaigno on 10 Aug 2017

To be honest, I feel whitespace should be ignored altogether within language IDs when used in a fenced code block. Users might be tempted to collapse multiword language names into a single token:

~~~World of Warcraft Addon data => ~~~worldofwarcraftaddondata

~~~worldofwarcraftaddondata => ~~~world-of-warcraft-addon-data

Just my two cents.

Alhadis on 10 Aug 2017

👍3

I'm assuming this Issue is also the reason why codeblocks on Issue comments no longer highlight anything, even if you use ```perl6 to specify language?

say 'I ♥ Programming'; # and comments

zoffixznet on 19 Sep 2017

@zoffixznet Yup.

lildude on 19 Sep 2017

👍1

Perl 6 repos should now be correctly classified (may need a push to force re-analysis). Closing. Please feel free to reopen if recently updated Perl 6 repos are still incorrectly classified.

lildude on 21 Jan 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings