Below is the original issue, which was arguing for "doing something about --skip-numeric-underscore-normalization." Since then, a consensus seems to have formed amongst users that the who feature should just be removed, because grouping by threes is not appropriate for numeric literals such as object IDs, phone numbers, dates, and much more. Please read the comments for a more detailed explanation.
Hey! As I understand, this option was added to keep the possibility to write 'localized' code, respecting the developers' regional numeric format.
I disagree with the premise that this is something to be considered. Even though my local date format is not ISO, I'm using YYYY-MM-DD in code. Even though my local format marks decimal places with a comma, I'm fine with writing 3.14 instead of 3,14.
The README makes some promises:
Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, […]. You will save time and mental energy for more important matters.
Blackened code looks the same regardless of the project you're reading. Formatting becomes transparent after a while and you can focus on the content instead.
Adding options like this, while having only a minor effect, goes directly against these promises. Each option is a new discussion to have. Each option is something new to discover: consider if I want to make a quick edit to a constant in someone's blackened open source project on the github.com in-browser editor. I know how to write black-compliant code, so I can risk not cloning the project and committing something without running black. But now that this option is there, I either need to look for the project's config first to see their preference for literals, or guess and risk a CI failure wasting my time.
And yes, I honestly believe this is a slippery slope.
You might say, well, that's okay, everyone using black would just use the defaults unless they are in an aforementioned non-three digit grouping region, right? Well…
We had a team at my company bump the black version a couple days ago. I glanced at their project today and was horrified to find skip-numeric-underscore-normalization = true in their config. Apparently they saw the option in the changelog and
Really, developers are a lost cause when it comes to code style arguments. Just don't let them have them, please.
--skip-numeric-underscore-normalization option.Rewrite the style of documentation of these options. Mention restrictive rules on when you as a developer are allowed to use it. Something like this:
Configuring black
Some special circumstances might require project specific configuration. But if you're using black, to not undermine all the benefits black gives you and the community, you must agree to only use these options if we permit it here.
--skip-numeric-underscore-normalizationYou may use this if: Your codebase is worked on only by developers from regions where the local number format does not group by three digits.
Effect: Numbers can be written both like100_000_000and1_0000_0000
--skip-string-normalizationYou may use this if: Your codebase predates black with pre-existing string conventions, and adoption would be unreasonably difficult if all strings had to instantly be reformatted.
Effect: Strings can be written both like'hello'and"hello"
Just as a changelog for this issue: I've changed the title and provided alternative solutions other than just removing the option (added the 'Possible fixes' section.)
I suspect I may have missed the original issue where this would have been more appropriate to bring up, but if this is being considered for reversal or modification, I'd like to point out that not all numbers have the same, shall we say, texture.
For example, there's nothing locale-specific about this number:
actually_a_date = 2018_05_01
which black by default now helpfully "fixes" to
actually_a_date = 20_180_501
Another example of not all numbers having the same texture: Inserting underscores into integers feels odd to me at times when the integer in question is a hardcoded ID.
For example, I'd prefer to see Issue(issue_id=1142395) in my code than to see Issue(issue_id=1_142_395), because the former makes it easier to grep for 1142395 and because the latter causes me to mentally read out "issue id one million one hundred forty two thousand ..." which feels not quite right.
I think in corner cases (this is not happening too often, or I'm not aware of something?) such as actually_a_date = 2018_05_01 we can just disable formatting for particular line.
I would argue removal of numeric-underscore-normalization is the best course of action. It still is a very fresh Python feature, so I assume we have yet to explore how useful (or troublesome) it is and thus we shouldn't force it on anyone. The inability to search on number is especially worrying to me.
And to make a case against that it is given that black has to have some opinion, I would like to say, that even with things like line length, the issue always was not if we should limit it, just a particular number. Here we do not even know if we want to use underscores at all anywhere in the code.
Anyway, even as an "opponent" of this normalization, if it stays I still think it shouldn't be optional because of the reasons stated by @underyx .
Here is a link to discussion which happened when the option was first introduced: https://github.com/ambv/black/pull/529
I like fix option 2 - "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black." There are clear examples documented above where numeric underscores not only don't make sense but obfuscate the intent of the code.
The first time Black reformatted some code with them I pushed the PR & saw the underscores in the numeric literals and was confused where they'd come from. It wasn't until a couple days later I had even heard that underscores in numeric literals were allowed now in Python 3.6 so at first I thought it was a syntax error.
I see them as being much like automatic string concatenation, these two lines are equivalent:
x = "this is a really really really realllllllllly looooooooooooooooooooooooooooooooooooooooooong line"
x = (
"this is a really really really realllllllllly "
"looooooooooooooooooooooooooooooooooooooooooong line"
)
If I add the parenthesis & split a string literal into multiple lines Black is absolutely permitted to rearrange the surrounding whitespace for consistency. However, It'd be rather invasive of Black to break the literal up in the first place even though it violates the 88 char line rule.
Similarly for numeric literals, if I choose to put underscores in them then I'd expect Black to respect that & leave them alone because changing them likely changes what I'm intending to express by putting them in. If I leave underscores out, then it's probable that's because my intent is to not make the numeric literal look like a western-style value with commas every 3 digits because of what the value represents (ex: a user id, or a date, etc)
I also agree with option 2: "Remove the whole feature of numeric-underscore-normalization; if it's not a universally enforceable standard, maybe it has no right to be a part of black."
While the philosophy of Black is to be uncompromising on the things it decides to enforce, that doesn't mean that it _has_ to enforce all aspects of your code. With most decisions it chooses to make are neither here nor there, the feature as it stands can make it much harder for teams to read the code: either because of the poor date format support (and what about telephone numbers literals??) or because of localisation. I can see that being a barrier to adoption for Black.
I mentioned earlier, but not in this thread: I consider altering numeric literals to be the same as altering a quoted string. Why would you do it? Even if one could determine the correct context (date? locale?) which one can't, one can't determine the human intent.
First time I've noticed this but it breaks readability and usability of UNIX timestamps. Makes it impossible to double-check timestamps from code
$ date -d @1_534_291_200
date: invalid date ‘@1_534_291_200’

Is this option still available? I don't see it in the command line options section of the docs. I'm getting used to using black for formatting and I have gone all in on one project, but I just noticed the underscores in the numbers and wow, that's weird. It's the first time I've seen this, and I'd like to omit this formatting if possible. My understanding is that you can configure black's formatting behavior by adding entries into the black section of the pyproject.toml file, and I have added skip-numeric-underscore-normalization = true. Is this all that's required, once this option is available?
@monocongo
Yes it is available, see the readme or black --help.
When enabled it ignores whatever formatting for numbers is currently in the files. Hence, once formatted by whatever tool, after enabling --skip-numeric-underscore-normalization black won't make any alternation to the files anymore in this regard.
If you want to remove all underscores from your number literals from your code, you could try modifying black by adding .replace('_', '') at the end of https://github.com/ambv/black/blob/5c2dd96a69a935cf45acbdf2ffabbd39b27d38fa/black.py#L2603 . I haven't test it, but so try it at your own risk.
If you want to contribute some arguments for and against numeric underscore that are more substantial than "that's weird" (that is, e.g. why you think its weird, other than "not what I'm used to") then please do so here, but otherwise a new issue will be more appropriative as this one seems to be dedicated to discussion if this option is worth keeping or should it be removed - and if that happens, what a strict default should be.
black idea is to be opinionated tool, feeling of "weirdness" is unavoidable, for at least some of the users. What we can do about that, is ensure - by a rational dialog - that the formatting decisions driving black behavior prove a right choice for the most use cases.
In this particular issue there are some voices that this particular feature has does not deliver a good results for couple of common cases (searching for a number in code base, unix timestamp formatting) and less common ones (AFAIK - date as a number; some kind of ID as a number).
There is also whole issue of black idea being "no customization" and then, options meant for "easing old project conversion" being used in brand new projects.
It's weird because it detracts from readability, IMHO. It breaks things like when I have numbers that are meaningful such as 20180711 (for July 11th, 2018), it turns that into 201_807_11 (and I'd expect the grouping to start from the other end, i.e. 20_180_711, since that's more akin to 20,180,711). I can see how this may provide benefit in some cases, but in my opinion, it's taking things a bit too far. (Of course you can get used to anything with enough exposure, it just took me aback when I first saw it, I wasn't aware of PEP515.) It's not my call, I'm just glad that this is one of the few options where users have discretion, hence my comment in support of keeping this option available and for confirmation that it's still in play.
Black does format 20180711 as 20_180_711; if you see something else, please report this as a bug.
Thanks for the follow up/clarification. You're right, I was mistaken. The grouping by three goes in the opposite direction to the right of the decimal point, and logically so. I shouldn't have commented about the behavior from a (mistaken) extrapolation.
For the record, I also agree this rule seems weird and wrong to apply globally. Numeric underscores generally only apply in cases where the number is a "natural" number, but that is not the sole purpose of all numbers in Python nor are the cases where it's not the sole purpose, obscure.
nor are the cases where it's not the sole purpose, obscure.
It would be love to see a source for this statement.
How many hard coded number literals that are not "natural" numbers are in the code base? We have listed issue numbers (which personally I have never seen as python literals), unix timestamps (but how often we hardcode these?), and dates as a number (do we have some examples where this used in the wild?).
Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are.
At the doctor so no links. But the place I’ve personally ran into this has been primarily in tests, where I have specific ID, dates, time stamps, etc written as hardcoded literals.
I use hardcoded literals in tests way more often than anywhere else tbh.
Sent from my iPhone
On Nov 19, 2018, at 2:08 PM, rooterkyberian notifications@github.com wrote:
nor are the cases where it's not the sole purpose, obscure.
It would be love to see a source for this statement.
How many hard coded number literals that are not "natural" numbers are in the code base? We have listed issue numbers (which personally I have never seen as python literals), unix timestamps (but how often we hardcode these?), and dates as a number (do we have some examples where this used in the wild?).Links to some opensource projects which have examples of these would be great help, at least then we can try to estimate how common these practices are.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
We skip numeric normalization at Instagram. Some data analysis found that a sizable majority of the numeric constants in our codebase were object ids for which greppability far outweighs formatting as a natural number. We also found other cases such as phone numbers and timestamps.
My opinion is that numeric formatting is out of scope for a formatter, because it is impossible to do correctly without understanding the meaning of the numeric literal. Asserting that all numeric literals "should" represent natural numbers is an opinionated bridge too far; that's no longer an issue of code formatting at all.
Great points all around. I think it's safe to say we have overwhelming support of removal of the feature, so I have updated my issue title and description to reflect this.
I don't understand the logic behind the feature.
Number without underscore will not be formatted by black, while number with pre-defined underscore will be formatted.
num = 100000000 ----black---> num=100000000
package_version = 2018_11 ----black---> package_version = 201_811
What I expected was the opposite situation.
@yech1990 what you have mentioned here is due py36 support detection. If you had both lines in the same file, both of them would get formatted with underscores.
You can force black to assume py36 features are available by adding --py36 as a command line argument. This behavior is not specific to this feature - it also used with things like inserting trailing commas after **kwargs.
Both are in the same file.
I use python 3.7
@rooterkyberian
Most helpful comment
I suspect I may have missed the original issue where this would have been more appropriate to bring up, but if this is being considered for reversal or modification, I'd like to point out that not all numbers have the same, shall we say, texture.
For example, there's nothing locale-specific about this number:
which black by default now helpfully "fixes" to