Here the maximum property is a double, while the input was an int64 or long. A decimal (128-bit) input would also result in the maximum property being a double.
1000000000000000 | measure-object -Maximum
Count : 1
Average :
Sum :
Maximum : 1000000000000000
Minimum :
StandardDeviation :
Property :
Count : 1
Average :
Sum :
Maximum : 1E+15
Minimum :
StandardDeviation :
Property :
Name Value
---- -----
PSVersion 7.0.0
PSEdition Core
GitCommitId 7.0.0
OS Microsoft Windows 10.0.16299
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
It is a fact that we convert all input objects to Double. It is by-design and it is a compromise because in general we don't know object types in pipeline.
@iSazonov, I understand that there's no trivial solution, but the current behavior is clearly both surprising and unhelpful:
PS> (10000000000000199 | Measure-Object -Sum).Sum
1.00000000000002E+16
Casting to [bigint]
shows that precision was lost; that is, _the summation doesn't work as intended, and you may not even notice_:
PS> [bigint] (10000000000000199 | Measure-Object -Sum).Sum
10000000000000200
I understand that there's no trivial solution
I can't think of anything better than explicit type conversion of input object (property) like -AsType [type].
Get-Random
does some manual type figuring-out for numbers. It's plausible we could adopt a similar approach in Measure-Object, type the parameter as object
and then verify the type is a supported one as it comes in.
Agreed in principle, @vexx32, but note that it's not about _parameters_ in this case, but about the numeric types encountered in the input.
I think we need something like the automatic type widening (promotion) found in PowerShell's number-literal parsing, where you start with the type of the first number type encountered in the input and widen as needed during processing:
For integer input types:
int
-> long
-> decimal
-> double
Encountering a [double]
(or the rarely used [float]
) in the input would instantly switch to [double]
. -Average
and -StandardDeviation
still have to output the _final result_ as a [double]
.
As an aside: there's a troubling inconsistency between type-widening in _number-literal parsing_ and _calculations_ (in expressions that don't use _casts_):
# A number literal that is beyond [long]::MaxValue is promoted to [decimal]
# The number below is [long]::MaxValue + 1 (cast to [decimal] to see the precise value)
PS> (9223372036854775808).GetType().Name
Decimal
# Performing the equivalent as a *calculation* promotes to [double],
# with loss of precision.
PS> ([long]::MaxValue + 1).GetType().Name
Double # !!
Thus, with implicit type conversions you may lose precision without realizing it; consider the difference between:
# OK - LHS forced to [decimal] preserves [decimal]
PS> [decimal] [long]::MaxValue +1
9223372036854775808
and:
# !! Implicit calculation coerces to [double] with precision loss, which the subsequent
# !! [decimal] cast cannot recover.
PS> [decimal] ([long]::MaxValue +1)
9223372036854780000 # !! Loss of precision
@vexx32, @SeeminglyScience, any thoughts?
Obviously, changing fundamental behavior like that would be extremely fraught.
My initial inclination is to avoid widening to double unless we get input that is either float
or double
and instead widen from int64
to BigInteger
as that'll get us more directly representative results. For decimal
inputs, widening to double
does make more sense, despite the loss of precision.
A property like Count
on Measure-Object, for example, could definitely stand to be BigInteger
on the high end since a double
-type Count isn't especially meaningful; all it gives you is a rough order of magnitude if you have enough objects. Other properties could, too, but it would be more dependent on the format the data is input in.
As for the inconsistencies between parsing and calculations -- I tend to think they should behave pretty similarly. Part of the trouble there is likely that .NET's default widening behaviours don't _quite_ match PowerShell's (at least where the parsing is concerned). Whether that should / could be addressed in a reasonable manner, I'm not sure. It would probably take a fair bit of tinkering around with LanguagePrimitives and the conversion methods there, plus I'm sure a good deal of manual work in the arithmetic operator binders.
Measure-Object
returns a specific type: Microsoft.PowerShell.Commands.GenericMeasureInfo
which defines the fields as follows:
System.Nullable`1[System.Double] Average
System.Nullable`1[System.Double] Sum
System.Nullable`1[System.Double] Maximum
System.Nullable`1[System.Double] Minimum
This is why everything comes out as double. "Fixing" this would require changing the type of the returned object and that would be a breaking change.
Good point, @bpayette, so - unfortunately - it would have to be an _opt-in enhancement_ via a parameter, say a -AsAutoNumber
switch (just a first suggestion, struggling to come up with a good name), similar to what @iSazonov proposed, though the idea would be to not have to ask for a specific type and let the widening algorithm pick the appropriate type.
@vexx32, agreed re float
and double
(in line with what I proposed), and also re [bigint]
.
Re for the inconsistencies between parsing and calculations: I think PowerShell's number-literal parsing is fine (albeit different from C#'s); it's the calculation behavior of any-result-larger-than-[long]::MaxValue-becomes-a-[double] that I find troubling, but I'll take this topic elsewhere.
I think PowerShell's number-literal parsing is fine
Glad to hear it, I'd hate to have to rewrite it again 馃槀
the idea would be to not have to ask for a specific type and let the widening algorithm pick the appropriate type
I do not think this is reliable. Also it complicates too the code. I think in real scenarios long type is enough.
I do not think this is reliable.
In what way would it not be reliable?
I think in real scenarios long type is enough.
I think in real scenarios long type is enough.
See https://stackoverflow.com/a/60609025/45375, for example.
The inputs _are_ [long]
s, but the invariable conversion to [double]
causes loss of precision for [long]
s beyond 9007199254740991
, which is particularly insidious in this case, given that you'd never expect -Maximum
to _modify the value_:
PS> [bigint] (132273694413991065 | Measure-Object -Maximum).Maximum
132273694413991072 # !! Different number, to lossy conversion to [double]
(Strictly speaking, no _widening_ is required in this case - only _preservation_ of the input type).
Stack OverflowI am tryng to compare the following data to obtain the largest number: $UserDeets name lastLogon ---- --------- Frank Ti 132273694413991065 Frank Ti 1322797428841...
I do not think this is reliable.
In what way would it not be reliable?
:-) https://github.com/PowerShell/PowerShell/issues/12103#issuecomment-598883028 Do you want to ask @vexx32 ? :-)
We could switch to BigInt but we lost performance. The same is for Dynamic type. I do not think that we'd want this.
I don't understand your stackoverflow example. My thought is to keep double type by default and add new parameter to switch to long type for inputs and results.
I wasn't aware Measure-Object
was a performance-critical scenario. 馃槈
Personally I think it makes the most sense to adopt the widest / least precise _input_ type for the overall output. Most likely, this would mean the actual class property would have to be Object
or at least ValueType
(can you make properties ValueType? I feel like I remember that's not really allowed or something).
@iSazonov
Do you want to ask @vexx32 ? :-)
The number-literal parsing is entirely incidental to this issue - the only reason I mentioned is that it is an instance of where we already perform automatic, helpful widening of types on demand.
We could switch to BigInt but we lost performance. The same is for Dynamic type. I do not think that we'd want this.
We don't need a dynamic type, and whether [bigint]
is needed depends on the specifics of the input.
And I agree with @vexx32 that performance shouldn't be the deciding factor here - see below.
I don't understand your stackoverflow example.
It was simply meant to show that the current (invariable) behavior is problematic in real-life situations; the [bigint] (132273694413991065 | Measure-Object -Maximum).Maximum
example by itself illustrates the problem well enough, I think.
@bpayette, I just noticed that the properties you quoted are from the obsolete (see below) Microsoft.PowerShell.Commands.GenericMeasureInfo
class, which was replaced by Microsoft.PowerShell.Commands.GenericObjectMeasureInfo
, precisely to fix the following (emphasis added):
This class is created for fixing "Measure-Object -MAX -MIN should work with ANYTHING that supports CompareTo"
GenericMeasureInfo class is shipped with PowerShell V2. Fixing this bug requires, changing the type of Maximum and Minimum properties which would be a breaking change. Hence created a new class to not have an appcompat issues with PS V2.
In short, properties Maximum
and Minimum
are already System.Object
-typed, so also applying the preservation of the input type during -Minimum
and -Maximum
operations to numeric types should be considered a _bug fix_.
@iSazonov, can you please label this as a bug?
And while supporting type widening for all properties is still worth considering - see #12141 - I think this fix alone will eliminate most currently problematic real-world scenarios.
@vexx32, even though I've personally never seen it in the wild, it is indeed possible to create ValueType
properties, and the docs even provide an example; see https://docs.microsoft.com/en-US/dotnet/api/System.ValueType; aside from being somewhat self-documenting and ensuring that only value types can be assigned / returned, working with ValueType
instances _directly_ has many limitations: not all value types are numbers, you can't perform arithmetic directly with ValueType
instances, you can't use stackalloc
or obtain pointers to them, ...
Provides the base class for value types.
@bpayette, I just noticed that the properties you quoted are from the obsolete
Microsoft.PowerShell.Commands.GenericMeasureInfo
class, which was replaced byMicrosoft.PowerShell.Commands.GenericObjectMeasureInfo
Doesn't look like Measure-Object
is using it :/
@vexx32, even though I've personally never seen it in the wild, it is indeed possible to create
ValueType
properties
You don't see it much because it's basically the same as typing it as object
. It's still boxed, and afaik doesn't really buy you anything other than documenting that a struct or primitive is expected.
Thanks, @SeeminglyScience - I didn't notice that it's only used _selectively_, with _non-numeric_ (but comparable) input when -Minimum
or -Maximum
are used:
# No -Min / -Max -> GenericMeasureInfo (for all input types)
PS> ('a', 'b' | Measure-Object -Maximum).GetType().Name
GenericMeasureInfo
# Non-numeric type, with -Min / -Max -> GenericObjectMeasureInfo
PS> ('a', 'b' | Measure-Object -Maximum).GetType().Name
GenericObjectMeasureInfo
So the fix would be to _always_ output GenericObjectMeasureInfo
when -Minimum
and / or -Maximum
are used.
I'm hoping that said change, which is conceptually undoubtedly a bug fix, falls into Bucket 3: Unlikely Grey Area, but I'm not sure I have the full picture. Any thoughts?
馃槵 That fix is janky enough as it is, to be outputting a different type from a switch like that, given the original class already has those members, no?
Seems like there may be much to gain and little to lose from replacing them both with a more useful type overall. 馃
You'll note that the Sum
property of even the newer class is still Nullable<double>
.
Comment in the code say that it is by-design and it was designed in V3 time.
So request is to change the design because V2 is not supported and we can deprecate GenericMeasureInfo class and use only GenericObjectMeasureInfo class.
/cc @SteveL-MSFT @daxian-dbw Please review this on PowerShell Committee.
That would be a good first step.
Ideally we would deprecate _both_ and introduce a new class that defines most if not all of its members as object
so that we can utilise the widest needed numeric type to handle the use case.
Great idea, @vexx32:
Note: ValueType
is used here to illustrate which properties are invariably numeric.
Based on @SeeminglyScience's feedback, using just object
in practice is probably the right choice.
// To replace both GenericObjectMeasureInfo and GenericMeasureInfo.
public sealed class ObjectMeasureInfo : MeasureInfo
{
public ValueType Count { get; set; }
public ValueType? Average { get; set; }
public ValueType? Sum { get; set; }
public ValueType? StandardDeviation { get; set; }
// As before in GenericObjectMeasureInfo
// Can be reference-type instances, as long as the type implements IComparable.
public object Maximum { get; set; }
public object Minimum { get; set; }
}
This type could then be used with the following type-widening rules, as also suggested in #12141 for
-Raw
, which would give us unified behavior:
For -Maximum
and -Minimum
, whatever input value is identified should be passed through as-is, as is already the case for non-numeric inputs (which is the fix for the bug at hand).
For the inherently non-integral -Average
and -StandardDeviation
measurements, [double]
is an appropriate default, but with (at least one) [decimal]
input [decimal]
should also be used on output.
For Sum
, all-integer-only input should should also output an integral type (starting with the (largest) input type), with _automatic type widening_, analogous to the widening (type promotion) that happens in PowerShell's number-literal parsing ([int]
->[long]
-> [decimal]
, and possibly even to [bigint]
rather than the [double]
that is the widest type for number literals.
Count
should widen on demand the same way as Sum
, though numbers beyond [int]
are unlikely to occur in practice, except perhaps if opt-in enumeration of array-valued properties is implemented via the proposed -Recurse
switch - see #7244.@PowerShell/powershell-committee reviewed this, we do not want to take a breaking change for the type of the output object. Instead, to support this scenario, we would propose a -ValueType
parameter that takes a enum consisting of the existing members that would output an object
so that the resulting input type is preserved.
I think we need to take a step back, @SteveL-MSFT:
That numeric types aren't preserved with -Min
and -Max
, with large integers even outputting lossy [double]
s that report a maximum that's not even among the input is clearly a bug: see #13422
The proposed -ValueType
enhancement based on an enum seems quite convoluted; I suggest the following instead:
-Raw
switch.-AsAutoNumber
switch.I recommend closing this issue in favor of these new ones.
Most helpful comment
Measure-Object
returns a specific type:Microsoft.PowerShell.Commands.GenericMeasureInfo
which defines the fields as follows:This is why everything comes out as double. "Fixing" this would require changing the type of the returned object and that would be a breaking change.