Rubocop: Style/AsciiComments Cop conflicts with accented author and company names.

Created on 3 Apr 2018  ·  10Comments  ·  Source: rubocop-hq/rubocop

Expected behavior

RuboCop does an exception for accented author and company names when it parses comments.
Since such names may include non-ascii characters. Example, my name is: Samuel Tallet-Sabathé.

I suggest Style/AsciiComments Cop ignore comment lines beginning with @author or Copyright.

Actual behavior

RuboCop detects an offense when it encounters, in comments, accented author or company names.

Steps to reproduce the problem

Scan with RuboCop a file containing this comment:

# Copyright: (c) 2003 The Pokémon Company
# @see https://wikipedia.org/wiki/Pok%C3%A9mon_Ruby_and_Sapphire

RuboCop version

λ rubocop -V
0.54.0 (using Parser 2.5.0.5, running on ruby 2.4.3 x64-mingw32)
enhancement stale

Most helpful comment

Depends on the perspective - for me it's preferable to transliterate everything to English, so more people would be able to read it without having to pause and thing. My name is also not Bozhidar, but I'm fine with going by it when I'm writing code. :-) I've worked in many international companies and where we had a consensus to use only English in documentation there were never practical implications or problems with checks like these one. I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)

As comments are unstructured it's very hard to come up with a good way to figure out what should be plain English and what could contain some other characters. To solve this ticket we can just whitelist certain words via configuration, which seems like a reasonable approach.

TBH, I don't believe the cop is very useful in any case.

It's useful to me, if it's not useful to you simply disable it. Things are configurable for a reason.

All 10 comments

TBH, I don't believe the cop is very useful in any case.

Its _idea_ was that comments should be in English (it references to corresponding rule from the style guide), but implementation by limiting to ASCII doesn't prohibit a lot of languages from all over the world, but instead prohibits diacritics (including English words like "café" or "naïve"), math symbols, typhography symbols (many people nowadays use keyboard addons to write in “proper” typographics—even in comments), examples (like this transliterator converts "текст" to "tekst") and so on.

I fully agree with @zverok - checking to ASCII to enforce English seems too prone to false positives.

Depends on the perspective - for me it's preferable to transliterate everything to English, so more people would be able to read it without having to pause and thing. My name is also not Bozhidar, but I'm fine with going by it when I'm writing code. :-) I've worked in many international companies and where we had a consensus to use only English in documentation there were never practical implications or problems with checks like these one. I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)

As comments are unstructured it's very hard to come up with a good way to figure out what should be plain English and what could contain some other characters. To solve this ticket we can just whitelist certain words via configuration, which seems like a reasonable approach.

TBH, I don't believe the cop is very useful in any case.

It's useful to me, if it's not useful to you simply disable it. Things are configurable for a reason.

Well, what I am trying to say:

  1. "Write comments in English" is a _useful_ rule;
  2. "All comments in ASCII" is a _loose_ way to check the rule (typography, math, diacritics, examples);
  3. In fact, I believe that Rubocop _and its default config_ has some responsibility for the preferred community style (many novices or people just introducing Rubocop typically have "don't turn it off/reconfigure unless you are really ought to" policy)
  4. So, there is a danger of "cargo-culted" community rule "no non-ASCII chars in comments in any case! dunno why, just was taught by Rubocop"

So, maybe, just _maybe_, the cop can be loosened up a bit? I believe that something like "50+% of comment chars are from Latin alphabet" will have all the same positive effect (point out forgotten, or intentionally written, comments entirely in Japanese, Russian, Hebrew and so on), but less negative ones?

I've noticed, however, that for speakers of languages that have accented characters, that's not always the case - they seem quite attached to those. :-)

Global communication should respect cultural diversity.

I suggest Style/AsciiComments Cop ignore comment lines beginning with @author or Copyright.

What do you think about this? It's a structured approach.

There's also an existing AllowedChars configuration option, where common English diacritics can be added. Maybe even in the default config. (It might warrant a rename of the cop, though.)

I'm fine with renaming this, seems the name was poorly chosen anyways. I'm also fine with extending the base config (or at least promoting it better). Maybe there can also be an option to just allow the accented superset of ascii characters?

:+1: Will see what I can do about the cop.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution and understanding!

This issues been automatically closed due to lack of activity. Feel free to re-open it if you ever come back to it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lepieru picture lepieru  ·  3Comments

tedPen picture tedPen  ·  3Comments

bbugh picture bbugh  ·  3Comments

mikegee picture mikegee  ·  3Comments

david942j picture david942j  ·  3Comments