Powershell: measure-object -maximum changes the type of its input

Created on 11 Mar 2020  路  23Comments  路  Source: PowerShell/PowerShell

Here the maximum property is a double, while the input was an int64 or long. A decimal (128-bit) input would also result in the maximum property being a double.

Steps to reproduce

1000000000000000 | measure-object -Maximum

Expected behavior

Count             : 1
Average           :
Sum               :
Maximum           : 1000000000000000
Minimum           :
StandardDeviation :
Property          :

Actual behavior

Count             : 1
Average           :
Sum               :
Maximum           : 1E+15
Minimum           :
StandardDeviation :
Property          :

Environment data

Name                           Value
----                           -----
PSVersion                      7.0.0
PSEdition                      Core
GitCommitId                    7.0.0
OS                             Microsoft Windows 10.0.16299
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
Area-Cmdlets-Utility Committee-Reviewed Hacktoberfest Issue-Question Up-for-Grabs

Most helpful comment

Measure-Object returns a specific type: Microsoft.PowerShell.Commands.GenericMeasureInfo which defines the fields as follows:

  System.Nullable`1[System.Double] Average
  System.Nullable`1[System.Double] Sum
  System.Nullable`1[System.Double] Maximum
  System.Nullable`1[System.Double] Minimum

This is why everything comes out as double. "Fixing" this would require changing the type of the returned object and that would be a breaking change.

All 23 comments

It is a fact that we convert all input objects to Double. It is by-design and it is a compromise because in general we don't know object types in pipeline.

@iSazonov, I understand that there's no trivial solution, but the current behavior is clearly both surprising and unhelpful:

PS> (10000000000000199 | Measure-Object -Sum).Sum
1.00000000000002E+16

Casting to [bigint] shows that precision was lost; that is, _the summation doesn't work as intended, and you may not even notice_:

PS> [bigint] (10000000000000199 | Measure-Object -Sum).Sum
10000000000000200

I understand that there's no trivial solution

I can't think of anything better than explicit type conversion of input object (property) like -AsType [type].

Get-Random does some manual type figuring-out for numbers. It's plausible we could adopt a similar approach in Measure-Object, type the parameter as object and then verify the type is a supported one as it comes in.

Agreed in principle, @vexx32, but note that it's not about _parameters_ in this case, but about the numeric types encountered in the input.

I think we need something like the automatic type widening (promotion) found in PowerShell's number-literal parsing, where you start with the type of the first number type encountered in the input and widen as needed during processing:

For integer input types:

int -> long -> decimal -> double

Encountering a [double] (or the rarely used [float]) in the input would instantly switch to [double]. -Average and -StandardDeviation still have to output the _final result_ as a [double].

As an aside: there's a troubling inconsistency between type-widening in _number-literal parsing_ and _calculations_ (in expressions that don't use _casts_):

# A number literal that is beyond [long]::MaxValue is promoted to [decimal]
# The number below is [long]::MaxValue + 1 (cast to [decimal] to see the precise value)
PS> (9223372036854775808).GetType().Name
Decimal
# Performing the equivalent as a *calculation* promotes to [double], 
# with loss of precision.
PS> ([long]::MaxValue + 1).GetType().Name
Double # !!

Thus, with implicit type conversions you may lose precision without realizing it; consider the difference between:

# OK - LHS forced to [decimal] preserves [decimal]
PS> [decimal] [long]::MaxValue +1
9223372036854775808

and:

# !! Implicit calculation coerces to [double] with precision loss, which the subsequent
# !! [decimal] cast cannot recover.
PS> [decimal] ([long]::MaxValue +1)
9223372036854780000 # !! Loss of precision

@vexx32, @SeeminglyScience, any thoughts?

Obviously, changing fundamental behavior like that would be extremely fraught.

My initial inclination is to avoid widening to double unless we get input that is either float or double and instead widen from int64 to BigInteger as that'll get us more directly representative results. For decimal inputs, widening to double does make more sense, despite the loss of precision.

A property like Count on Measure-Object, for example, could definitely stand to be BigInteger on the high end since a double-type Count isn't especially meaningful; all it gives you is a rough order of magnitude if you have enough objects. Other properties could, too, but it would be more dependent on the format the data is input in.

As for the inconsistencies between parsing and calculations -- I tend to think they should behave pretty similarly. Part of the trouble there is likely that .NET's default widening behaviours don't _quite_ match PowerShell's (at least where the parsing is concerned). Whether that should / could be addressed in a reasonable manner, I'm not sure. It would probably take a fair bit of tinkering around with LanguagePrimitives and the conversion methods there, plus I'm sure a good deal of manual work in the arithmetic operator binders.

Measure-Object returns a specific type: Microsoft.PowerShell.Commands.GenericMeasureInfo which defines the fields as follows:

  System.Nullable`1[System.Double] Average
  System.Nullable`1[System.Double] Sum
  System.Nullable`1[System.Double] Maximum
  System.Nullable`1[System.Double] Minimum

This is why everything comes out as double. "Fixing" this would require changing the type of the returned object and that would be a breaking change.

Good point, @bpayette, so - unfortunately - it would have to be an _opt-in enhancement_ via a parameter, say a -AsAutoNumber switch (just a first suggestion, struggling to come up with a good name), similar to what @iSazonov proposed, though the idea would be to not have to ask for a specific type and let the widening algorithm pick the appropriate type.

@vexx32, agreed re float and double (in line with what I proposed), and also re [bigint].


Re for the inconsistencies between parsing and calculations: I think PowerShell's number-literal parsing is fine (albeit different from C#'s); it's the calculation behavior of any-result-larger-than-[long]::MaxValue-becomes-a-[double] that I find troubling, but I'll take this topic elsewhere.

I think PowerShell's number-literal parsing is fine

Glad to hear it, I'd hate to have to rewrite it again 馃槀

the idea would be to not have to ask for a specific type and let the widening algorithm pick the appropriate type

I do not think this is reliable. Also it complicates too the code. I think in real scenarios long type is enough.

I do not think this is reliable.

In what way would it not be reliable?

I think in real scenarios long type is enough.
I think in real scenarios long type is enough.

See https://stackoverflow.com/a/60609025/45375, for example.

The inputs _are_ [long]s, but the invariable conversion to [double] causes loss of precision for [long]s beyond 9007199254740991, which is particularly insidious in this case, given that you'd never expect -Maximum to _modify the value_:

PS> [bigint] (132273694413991065 | Measure-Object -Maximum).Maximum
132273694413991072  # !! Different number, to lossy conversion to [double]

(Strictly speaking, no _widening_ is required in this case - only _preservation_ of the input type).

Stack Overflow
I am tryng to compare the following data to obtain the largest number: $UserDeets name lastLogon ---- --------- Frank Ti 132273694413991065 Frank Ti 1322797428841...

I do not think this is reliable.
In what way would it not be reliable?

:-) https://github.com/PowerShell/PowerShell/issues/12103#issuecomment-598883028 Do you want to ask @vexx32 ? :-)
We could switch to BigInt but we lost performance. The same is for Dynamic type. I do not think that we'd want this.

I don't understand your stackoverflow example. My thought is to keep double type by default and add new parameter to switch to long type for inputs and results.

I wasn't aware Measure-Object was a performance-critical scenario. 馃槈

Personally I think it makes the most sense to adopt the widest / least precise _input_ type for the overall output. Most likely, this would mean the actual class property would have to be Object or at least ValueType (can you make properties ValueType? I feel like I remember that's not really allowed or something).

@iSazonov

Do you want to ask @vexx32 ? :-)

The number-literal parsing is entirely incidental to this issue - the only reason I mentioned is that it is an instance of where we already perform automatic, helpful widening of types on demand.

We could switch to BigInt but we lost performance. The same is for Dynamic type. I do not think that we'd want this.

We don't need a dynamic type, and whether [bigint] is needed depends on the specifics of the input.

And I agree with @vexx32 that performance shouldn't be the deciding factor here - see below.

I don't understand your stackoverflow example.

It was simply meant to show that the current (invariable) behavior is problematic in real-life situations; the [bigint] (132273694413991065 | Measure-Object -Maximum).Maximum example by itself illustrates the problem well enough, I think.

@bpayette, I just noticed that the properties you quoted are from the obsolete (see below) Microsoft.PowerShell.Commands.GenericMeasureInfo class, which was replaced by Microsoft.PowerShell.Commands.GenericObjectMeasureInfo, precisely to fix the following (emphasis added):

This class is created for fixing "Measure-Object -MAX -MIN should work with ANYTHING that supports CompareTo"
GenericMeasureInfo class is shipped with PowerShell V2. Fixing this bug requires, changing the type of Maximum and Minimum properties which would be a breaking change. Hence created a new class to not have an appcompat issues with PS V2.

In short, properties Maximum and Minimum are already System.Object-typed, so also applying the preservation of the input type during -Minimum and -Maximum operations to numeric types should be considered a _bug fix_.

@iSazonov, can you please label this as a bug?

And while supporting type widening for all properties is still worth considering - see #12141 - I think this fix alone will eliminate most currently problematic real-world scenarios.


@vexx32, even though I've personally never seen it in the wild, it is indeed possible to create ValueType properties, and the docs even provide an example; see https://docs.microsoft.com/en-US/dotnet/api/System.ValueType; aside from being somewhat self-documenting and ensuring that only value types can be assigned / returned, working with ValueType instances _directly_ has many limitations: not all value types are numbers, you can't perform arithmetic directly with ValueType instances, you can't use stackalloc or obtain pointers to them, ...

Provides the base class for value types.

@bpayette, I just noticed that the properties you quoted are from the obsolete Microsoft.PowerShell.Commands.GenericMeasureInfo class, which was replaced by Microsoft.PowerShell.Commands.GenericObjectMeasureInfo

Doesn't look like Measure-Object is using it :/

@vexx32, even though I've personally never seen it in the wild, it is indeed possible to create ValueType properties

You don't see it much because it's basically the same as typing it as object. It's still boxed, and afaik doesn't really buy you anything other than documenting that a struct or primitive is expected.

Thanks, @SeeminglyScience - I didn't notice that it's only used _selectively_, with _non-numeric_ (but comparable) input when -Minimum or -Maximum are used:

# No -Min / -Max -> GenericMeasureInfo (for all input types)
PS> ('a', 'b' | Measure-Object -Maximum).GetType().Name
GenericMeasureInfo

# Non-numeric type, with -Min / -Max -> GenericObjectMeasureInfo
PS> ('a', 'b' | Measure-Object -Maximum).GetType().Name
GenericObjectMeasureInfo

So the fix would be to _always_ output GenericObjectMeasureInfo when -Minimum and / or -Maximum are used.

I'm hoping that said change, which is conceptually undoubtedly a bug fix, falls into Bucket 3: Unlikely Grey Area, but I'm not sure I have the full picture. Any thoughts?

馃槵 That fix is janky enough as it is, to be outputting a different type from a switch like that, given the original class already has those members, no?

Seems like there may be much to gain and little to lose from replacing them both with a more useful type overall. 馃

You'll note that the Sum property of even the newer class is still Nullable<double>.

Comment in the code say that it is by-design and it was designed in V3 time.
So request is to change the design because V2 is not supported and we can deprecate GenericMeasureInfo class and use only GenericObjectMeasureInfo class.

/cc @SteveL-MSFT @daxian-dbw Please review this on PowerShell Committee.

That would be a good first step.

Ideally we would deprecate _both_ and introduce a new class that defines most if not all of its members as object so that we can utilise the widest needed numeric type to handle the use case.

Great idea, @vexx32:

Note: ValueType is used here to illustrate which properties are invariably numeric.
Based on @SeeminglyScience's feedback, using just object in practice is probably the right choice.

    // To replace both GenericObjectMeasureInfo and GenericMeasureInfo.
    public sealed class ObjectMeasureInfo : MeasureInfo
    {
        public ValueType Count { get; set; }
        public ValueType? Average { get; set; }
        public ValueType? Sum { get; set; }
        public ValueType? StandardDeviation { get; set; }
        // As  before in GenericObjectMeasureInfo
        // Can be reference-type instances, as long as the type implements IComparable.
        public object Maximum { get; set; }
        public object Minimum { get; set; }
    }

This type could then be used with the following type-widening rules, as also suggested in #12141 for
-Raw, which would give us unified behavior:

  • For -Maximum and -Minimum, whatever input value is identified should be passed through as-is, as is already the case for non-numeric inputs (which is the fix for the bug at hand).

  • For the inherently non-integral -Average and -StandardDeviation measurements, [double] is an appropriate default, but with (at least one) [decimal] input [decimal] should also be used on output.

  • For Sum, all-integer-only input should should also output an integral type (starting with the (largest) input type), with _automatic type widening_, analogous to the widening (type promotion) that happens in PowerShell's number-literal parsing ([int] ->[long] -> [decimal], and possibly even to [bigint] rather than the [double] that is the widest type for number literals.

    • Count should widen on demand the same way as Sum, though numbers beyond [int] are unlikely to occur in practice, except perhaps if opt-in enumeration of array-valued properties is implemented via the proposed -Recurse switch - see #7244.

@PowerShell/powershell-committee reviewed this, we do not want to take a breaking change for the type of the output object. Instead, to support this scenario, we would propose a -ValueType parameter that takes a enum consisting of the existing members that would output an object so that the resulting input type is preserved.

I think we need to take a step back, @SteveL-MSFT:

  • That numeric types aren't preserved with -Min and -Max, with large integers even outputting lossy [double]s that report a maximum that's not even among the input is clearly a bug: see #13422

  • The proposed -ValueType enhancement based on an enum seems quite convoluted; I suggest the following instead:

    • 12141 already proposes an opt-in with automatic type-widening on a per-property basis, via a -Raw switch.

    • 13423 complements this proposal with a whole-object opt-in, via an -AsAutoNumber switch.

I recommend closing this issue in favor of these new ones.

Was this page helpful?
0 / 5 - 0 ratings