RuboCop does an exception for accented author and company names when it parses comments.
Since such names may include non-ascii characters. Example, my name is: Samuel Tallet-Sabathé.
I suggest Style/AsciiComments Cop ignore comment lines beginning with @author or Copyright.
RuboCop detects an offense when it encounters, in comments, accented author or company names.
Scan with RuboCop a file containing this comment:
# Copyright: (c) 2003 The Pokémon Company
# @see https://wikipedia.org/wiki/Pok%C3%A9mon_Ruby_and_Sapphire
λ rubocop -V
0.54.0 (using Parser 2.5.0.5, running on ruby 2.4.3 x64-mingw32)
TBH, I don't believe the cop is very useful in any case.
Its _idea_ was that comments should be in English (it references to corresponding rule from the style guide), but implementation by limiting to ASCII doesn't prohibit a lot of languages from all over the world, but instead prohibits diacritics (including English words like "café" or "naïve"), math symbols, typhography symbols (many people nowadays use keyboard addons to write in “proper” typographics—even in comments), examples (like this transliterator converts "текст" to "tekst") and so on.
I fully agree with @zverok - checking to ASCII to enforce English seems too prone to false positives.
Depends on the perspective - for me it's preferable to transliterate everything to English, so more people would be able to read it without having to pause and thing. My name is also not Bozhidar, but I'm fine with going by it when I'm writing code. :-) I've worked in many international companies and where we had a consensus to use only English in documentation there were never practical implications or problems with checks like these one. I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)
As comments are unstructured it's very hard to come up with a good way to figure out what should be plain English and what could contain some other characters. To solve this ticket we can just whitelist certain words via configuration, which seems like a reasonable approach.
TBH, I don't believe the cop is very useful in any case.
It's useful to me, if it's not useful to you simply disable it. Things are configurable for a reason.
Well, what I am trying to say:
So, maybe, just _maybe_, the cop can be loosened up a bit? I believe that something like "50+% of comment chars are from Latin alphabet" will have all the same positive effect (point out forgotten, or intentionally written, comments entirely in Japanese, Russian, Hebrew and so on), but less negative ones?
I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)
Global communication should respect cultural diversity.
I suggest Style/AsciiComments Cop ignore comment lines beginning with
@authororCopyright.
What do you think about this? It's a structured approach.
There's also an existing AllowedChars configuration option, where common English diacritics can be added. Maybe even in the default config. (It might warrant a rename of the cop, though.)
I'm fine with renaming this, seems the name was poorly chosen anyways. I'm also fine with extending the base config (or at least promoting it better). Maybe there can also be an option to just allow the accented superset of ascii characters?
:+1: Will see what I can do about the cop.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!
This issues been automatically closed due to lack of activity. Feel free to re-open it if you ever come back to it.
Most helpful comment
Depends on the perspective - for me it's preferable to transliterate everything to English, so more people would be able to read it without having to pause and thing. My name is also not Bozhidar, but I'm fine with going by it when I'm writing code. :-) I've worked in many international companies and where we had a consensus to use only English in documentation there were never practical implications or problems with checks like these one. I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)
As comments are unstructured it's very hard to come up with a good way to figure out what should be plain English and what could contain some other characters. To solve this ticket we can just whitelist certain words via configuration, which seems like a reasonable approach.
It's useful to me, if it's not useful to you simply disable it. Things are configurable for a reason.