Recently many of my Perl 6 repos have started showing Unknown as the language, although two of them continue to show Perl 6. I can't seem to figure out which recent change would have caused this, or what difference in my repos causes this discrepancy.
Ah, that would be a side-effect of renaming a language: see #3672. I recall the same thing happening with s/Groff/Roff/ in language breakdowns too. Presumably something to do with caching.
In either case, it's only temporary and should be remedied by forcing the cache to refresh (pushing changes or republishing the repository, etc).
Thanks, will pass this info along to others.
@lildude Do you know if this is a known issue?
@pchaigno This is more of a known limitation than issue. We don't perform a whole-sale recalculation when changes are made to Linguist - it's just waaay too expensive to do.
@ugexe Looking at your "Unknown" repo from GitHub's perspective, I can see it's still recognised as "Perl6". Your "Perl 6" repo is recognised as "Perl 6" (with the space) so this is indeed going to be exactly as @Alhadis pointed out.
Pushing any change the the "Unknown" repo will kick off a job that will recalculate the language stats. Please keep in mind that the job is low priority so can take a while to run, especially during peak west-coast :us: business hours.
Hm, I see. I though language names were resolved at rendering time, hence my question :-)
They are, but only by comparing what's already been determined and stored in the db when the analysis was performed with what Linguist currently says - it's literally if lang = Linguist::Language[lang] where lang is one of the top languages already determined during analysis. In this case the db says "Perl6" so Linguist::Language['Perl6'] == nil hence "Other".
Would keeping a record of name-changes help? I believe Homebrew do something similar when they rename formulae, so searching for former names will still yield the correct formula.
Would keeping a record of name-changes help?
I'd imagine so.
So the languages aren't indexed by language id? I would have expected if lang = Linguist::Language[lang_id]...
So the languages aren't indexed by language id?
They are, but when things get to the view_model (via the cache) things are queried by name. I'm not sure of the justification for this as this precedes me - things haven't changed on this front for well over 5 years - and I've not taken the time to follow the code all the way through to try and work out the logic myself. I'm sure it could probably be improved though as Linguist as moved on a long way in five years.
As an aside, if lang = Linguist::Language[lang_id] wouldn't work as Linguist::Language[lang_id] currently expects a string.
Thanks for the explanation @lildude! ❤️
Does the same rename also affect fenced code blocks in Markdown files?
Previously I could define a perl6 fenced codeblock and it would highlight. This is no longer the case.
@0racle yup
We could probably add an alias for that case though.
To be honest, I feel whitespace should be ignored altogether within language IDs when used in a fenced code block. Users might be tempted to collapse multiword language names into a single token:
~~~World of Warcraft Addon data=>~~~worldofwarcraftaddondata
Both notations should be understood. I've yet to encounter a language name that requires a separator to disambiguate it from a completely different, similarly-named language. Heck, you could probably skip hyphens too:
~~~worldofwarcraftaddondata=>~~~world-of-warcraft-addon-data
Just my two cents.
I'm assuming this Issue is also the reason why codeblocks on Issue comments no longer highlight anything, even if you use ```perl6 to specify language?
say 'I ♥ Programming'; # and comments
@zoffixznet Yup.
Perl 6 repos should now be correctly classified (may need a push to force re-analysis). Closing. Please feel free to reopen if recently updated Perl 6 repos are still incorrectly classified.
Most helpful comment
To be honest, I feel whitespace should be ignored altogether within language IDs when used in a fenced code block. Users might be tempted to collapse multiword language names into a single token:
Both notations should be understood. I've yet to encounter a language name that requires a separator to disambiguate it from a completely different, similarly-named language. Heck, you could probably skip hyphens too:
Just my two cents.