Powershell: Formatting system: format large numbers with thousands separators (digit grouping) by default

Created on 6 Sep 2020  路  13Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

The output formatting system's purpose is to present data to the _human_ reader.
(As such - because the formatted output isn't meant for _programmatic_ processing - it isn't part of PowerShell's breaking-changes contract.)

Currently, large numbers are output without thousands separators, i.e. without grouping the digits for easier comprehension.

I suggest applying this grouping to all numbers (that aren't explicitly formatted via formatting data):

# WISHFUL THINKING - but you can get a preview with the prototype below.

# Number by itself (out-of-band formatting)
PS> 1000
1,000

# Implicit in-band formatting (table or list)
PS> @{ num = 1000 }

Name                           Value
----                           -----
num                            1,000

# Should also automatically apply to types with explicit formatting data 
# (unless number formatting is part of a given list item / table column).
# Note the "Size" column.
PS> Get-Item $PROFILE

    Directory: /Users/jdoe/.config/powershell

UnixMode   User             Group                 LastWriteTime           Size  Name
--------   ----             -----                 -------------           ----  ----
-rw-r--r-- jdoe             staff              10/23/2019 22:31           1,934  Microsoft.PowerShell_profile.ps1

As @ThomasNieto proposes below, a new _preference variable_ and _new switch_ for the Format-* cmdlets could allow opting into the old behavior. E.g.: $PSThousandsGrouping with values $true and $false (default $true), and switch
-ThousandsGrouping.

Proposed technical implementation details (optional)

Here's a quick _prototype_ that uses the ETS. It is _not_ the suggested implementation, for reasons of both performance and also changing the behavior of explicit .ToString() calls.

A proper implementation would require modifying the formatting system itself.

# Prototype:
[int16], [int], [long], [double], [decimal], [bigint], [uint16], [uint], [uint64] | % {

  Update-TypeData -TypeName $_.FullName  -MemberType ScriptMethod -MemberName ToString -Value { 
    # Determine how many decimal places there are in the original representation.
    # Note: PowerShell's string interpolation uses the *invariant* culture, so '.'
    #       can reliably be assumed to be the decimal mark.
    $numDecimalPlaces = ("$this" -replace '^[^.]+(?:.(.+))?', '$1').Length

    # Format with thousands grouping and the same number of decimal places.
    # Note: This will create a culture-sensitive representation
    #       just like with the default output formatting.
     # CAVEAT:
     #  To avoid a crash (from infinite recursion?), both .psobject.BaseObject 
     #  and the -f operator must be used.
     #  ($this.psobject.BaseObject.ToString("...") also crashes).
    "{0:N$numDecimalPlaces}" -f $this.psobject.BaseObject
  } -Force

}
Issue-Enhancement WG-Engine

Most helpful comment

To get the current behavior there should be a switch parameter on the format commands and a preference variable to enable/disable globally.

All 13 comments

(As such - because the formatted output isn't meant for _programmatic_ processing - it isn't part of PowerShell's breaking-changes contract.)

Although this might be true, I am afraid that the purpose will be more confusing than an advantage in human readability especially for a novice programmer.

If I do this: @{ num = 1000 } | Out-File .\Output.txt
Will it have thousands separators in the output file or not?
I presume they shouldn't show up in the output, as I am not doing e.g. @{ num = 1000 } | Format-Table | Out-File .\Output.txt
which is wrong although a lot a lot of PowerShell users do this or have already done this (there are several examples of this on StackOverflow). For this group, their process (relying on a 'Format-Table' output) might break. _... Sorry, you should have read the small print in the contact_ 馃槖
So, if this assumption is correct, a novice programmer will likely get confused by the difference outputted on the screen and what is actually in the file.

There is a UX problem - if copy-paste from console it is very annoying because it is culture sensitive.

You might also consider to do this on a specific type...
The type that intended for large numbers and can hold all the other types (which is actually not in your prototype list): [BigInt].
If you just do this just for [BigInt], you would be able to quiet easily format a large number with thousands separators, like:

PS C:\> [BigInt]1000
1,000
PS C:\> @{ num = [BigInt]1000 }

Name                           Value
----                           -----
num                            1,000

I think that's more likely to just be needlessly confusing, having one numeric type that behaves differently to the rest.

If this is going to change, all number types should behave similarly.

To get the current behavior there should be a switch parameter on the format commands and a preference variable to enable/disable globally.

@iSazonov, the culture-specific output not being reusable in source code already applies to numbers _with decimal places_ (and also to dates, for instance):

PS> $o = [cultureinfo]::CurrentCulture; [cultureinfo]::CurrentCulture = 'fr-FR'; 1.23; [cultureinfo]::CurrentCulture = $o
1,23  # Decimal mark is ","

True, with the thousands grouping it would occur more often, which is why @ThomasNieto's suggestion (preference variable and switch) is a good one (I've added the suggestion, fleshed out, to the OP).

@iRon7, nothing would change for Out-File / >: you'll get the same representation in the file as you would get on the screen; because Out-File also uses the formatting system, files created this way shouldn't be relied on for further programmatic processing. Also, something like @{ num = 1000 } | Format-Table | Out-File out.txt is legitimate, because it allows you to pick the "shape" of the output explicitly instead of relying on the default.

I'd definitely like to see an option for this. I think making this default would lead to a lot of confusion on both ends of the experience spectrum though.

  • Veteran users have already been trained that if they see digit separators then they're dealing with a string. If they want to treat it like a number, they need to prep the string for parsing
  • Newer users will have difficulty troubleshooting why their number only has digit separators sometimes. (e.g. Write-Host 1000 would output 1000 but 1000 would output 1,000)

It needed to sinking in a little, but I guess I actually like the basic idea.
_Provided there is an option to disable it_ because (if I understand it right) here lays in essence the pitfall:

(1000 | Out-String) -eq ('1000' | Out-String)

This is $True in CMO and will be $False in FMO.

The same discrepancy (between CMO and FMO) will happen when I do this:

1000 | Out-File .\Thousand.txt
$Thousand = Get-Content .\Thousand.txt
$Thousand -eq 1000

Or is this due to the fact that it concerns a prototype?

@SeeminglyScience:

At the end of the day I'd personally be happy with either solution (grouping by default ON vs. OFF), though I think that in the long run having it on by default is more beneficial:

  • Readability is then enhanced _by default_. While that comes at the expense of copying and pasting output as number literals usable in source code, I think the readability aspect trumps that, given that presenting output readably to the human observer is the very purpose of the formatting system.

  • Conversely, if OFF were the default, it wouldn't be easy for beginners to discover how to opt in.

  • I don't think we need to worry about Write-Host, which already presents enough puzzles to the uninitiated - e.g.,
    Write-Host @{ a =1; b = 2} printing System.Collections.DictionaryEntry System.Collections.DictionaryEntry. However, unlike with the preference variable / new switch we're discussing here, information about Write-Host is easy to discover and conceptualize - even though the help topic is currently lacking: see https://github.com/MicrosoftDocs/PowerShell-Docs/issues/6599.
    The short of it: Write-Host uses .NET .ToString() stringification on its arguments, which is unrelated to PowerShell's formatting system.

@iRon7:

Yes, these behaviors would change, but I do not consider that problematic:

(1000 | Out-String) -eq ('1000' | Out-String)

  • 1000 is unequivocally a number literal, '1000' is unequivocally a string.
  • You need to know that Out-String applies default formatting to its inputs, and that the results will _only_ be the same if the input is a _strings_ (leaving aside that Out-String currently appends an extra newline).

The same applies to Out-File; if you _do_ want .ToString() stringification, use Set-Content instead.

1000 | Set-Content .\Thousand.txt
$Thousand = Get-Content .\Thousand.txt
$Thousand -eq 1000

But note that even that _already_ falls apart if you have a number _with decimal places_ with a culture that uses , as the decimal mark, because argument-less .ToString() calls use the current culture.

  • Readability is then enhanced _by default_. While that comes at the expense of copying and pasting output as number literals usable in source code, I think the readability aspect trumps that, given that presenting output readably to the human observer is the very purpose of the formatting system.
  • Conversely, if OFF were the default, it wouldn't be easy for beginners to discover how to opt in.

Yeah I agree for sure. I just expect a lot of confusion around it if it gets implemented. Similar to how some format definitions include columns that aren't real properties, I see folks having issues with that at least once a week. Also when you get an object that returns an actual formatted string that's going to be hard to pin down the difference.

  • I don't think we need to worry about Write-Host, which already presents enough puzzles to the uninitiated - e.g., (...)

True, but I think it's much harder to reason about with a primitive. A complex object is expected to be formatted differently, where a primitive you usually expect the stringification of it.

much harder to reason about with a primitive

I think that all users need to know is: (a) Write-Host works differently, and (b) given that it uses .ToString() on its input objects - whether primitive or not - you can easily test the behavior with, say, $var = 1.23; $var.ToString() - although there is indeed an _existing_ pitfall there: Write-Host 1.23 is _not_ culture-sensitive, because a _literal_ argument is used; by contrast, $num = 1.23; Write-Host $num _is_ culture-sensitive; I've updated the docs issue with details.

Generally, I think users from cultures that use , as the decimal mark are already ware that _neither_ Write-Host nor Out-* cmdlets / implicit formatting necessarily preserve the original representation of a number.


I hear you on the potential for confusion, but I think this confusion will go away over time, and won't apply to new users.
Not being able to readily distinguish between the output from 1000 and '1,000' is somewhat inconvenient, but I think the inconvenience of having to call .GetType() / Get-Member to disambiguate is outweighed by the benefits of readability.

I think we now understand the issues involved and just happen to disagree on what the default should be.
Let's see what others have to say.

I think that all users need to know is: (a) Write-Host works differently, and (b) given that it uses .ToString() on its input objects

Write-Host was an example, the ToString difference is what I'm talking about. And yes, once they understand what it happening they will understand what it is happening, but it's still introducing a new newbie pitfall.

(...) and won't apply to new users.

I disagree, I think it's most likely to confuse new or very casual users who are more likely to try to fit the stringification of a primitive into a report or something.

I think we now understand the issues involved and just happen to disagree on what the default should be.
Let's see what others have to say.

馃憤

Was this page helpful?
0 / 5 - 0 ratings