Rubocop: Style/AsciiComments Cop conflicts with accented author and company names.

Created on 3 Apr 2018 · 10Comments · Source: rubocop-hq/rubocop

Expected behavior

RuboCop does an exception for accented author and company names when it parses comments.
Since such names may include non-ascii characters. Example, my name is: Samuel Tallet-Sabathé.

I suggest Style/AsciiComments Cop ignore comment lines beginning with @author or Copyright.

Actual behavior

RuboCop detects an offense when it encounters, in comments, accented author or company names.

Steps to reproduce the problem

Scan with RuboCop a file containing this comment:

# Copyright: (c) 2003 The Pokémon Company
# @see https://wikipedia.org/wiki/Pok%C3%A9mon_Ruby_and_Sapphire

RuboCop version

λ rubocop -V
0.54.0 (using Parser 2.5.0.5, running on ruby 2.4.3 x64-mingw32)

enhancement stale

Source

SamuelTS

Most helpful comment

Depends on the perspective - for me it's preferable to transliterate everything to English, so more people would be able to read it without having to pause and thing. My name is also not Bozhidar, but I'm fine with going by it when I'm writing code. :-) I've worked in many international companies and where we had a consensus to use only English in documentation there were never practical implications or problems with checks like these one. I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)

As comments are unstructured it's very hard to come up with a good way to figure out what should be plain English and what could contain some other characters. To solve this ticket we can just whitelist certain words via configuration, which seems like a reasonable approach.

TBH, I don't believe the cop is very useful in any case.

It's useful to me, if it's not useful to you simply disable it. Things are configurable for a reason.

bbatsov on 15 Apr 2018

👍2

All 10 comments

TBH, I don't believe the cop is very useful in any case.

Its _idea_ was that comments should be in English (it references to corresponding rule from the style guide), but implementation by limiting to ASCII doesn't prohibit a lot of languages from all over the world, but instead prohibits diacritics (including English words like "café" or "naïve"), math symbols, typhography symbols (many people nowadays use keyboard addons to write in “proper” typographics—even in comments), examples (like this transliterator converts "текст" to "tekst") and so on.

zverok on 13 Apr 2018

👍2

I fully agree with @zverok - checking to ASCII to enforce English seems too prone to false positives.

thomthom on 14 Apr 2018

👍2

TBH, I don't believe the cop is very useful in any case.

It's useful to me, if it's not useful to you simply disable it. Things are configurable for a reason.

bbatsov on 15 Apr 2018

👍2

Well, what I am trying to say:

"Write comments in English" is a _useful_ rule;
"All comments in ASCII" is a _loose_ way to check the rule (typography, math, diacritics, examples);
In fact, I believe that Rubocop _and its default config_ has some responsibility for the preferred community style (many novices or people just introducing Rubocop typically have "don't turn it off/reconfigure unless you are really ought to" policy)
So, there is a danger of "cargo-culted" community rule "no non-ASCII chars in comments in any case! dunno why, just was taught by Rubocop"

So, maybe, just _maybe_, the cop can be loosened up a bit? I believe that something like "50+% of comment chars are from Latin alphabet" will have all the same positive effect (point out forgotten, or intentionally written, comments entirely in Japanese, Russian, Hebrew and so on), but less negative ones?

zverok on 15 Apr 2018

👎1 👍1

I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)

Global communication should respect cultural diversity.

I suggest Style/AsciiComments Cop ignore comment lines beginning with @author or Copyright.

What do you think about this? It's a structured approach.

SamuelTS on 15 Apr 2018

There's also an existing AllowedChars configuration option, where common English diacritics can be added. Maybe even in the default config. (It might warrant a rename of the cop, though.)

Drenmi on 15 Apr 2018

I'm fine with renaming this, seems the name was poorly chosen anyways. I'm also fine with extending the base config (or at least promoting it better). Maybe there can also be an option to just allow the accented superset of ascii characters?

bbatsov on 16 Apr 2018

👍1

:+1: Will see what I can do about the cop.

zverok on 16 Apr 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!